WO2009124440A1 - 一种内容识别的方法、系统和装置 - Google Patents

一种内容识别的方法、系统和装置 Download PDF

Info

Publication number
WO2009124440A1
WO2009124440A1 PCT/CN2008/073001 CN2008073001W WO2009124440A1 WO 2009124440 A1 WO2009124440 A1 WO 2009124440A1 CN 2008073001 W CN2008073001 W CN 2008073001W WO 2009124440 A1 WO2009124440 A1 WO 2009124440A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
entity
attribute
feature value
query request
Prior art date
Application number
PCT/CN2008/073001
Other languages
English (en)
French (fr)
Inventor
高洪涛
刘义俊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP08873839A priority Critical patent/EP2264634A4/en
Publication of WO2009124440A1 publication Critical patent/WO2009124440A1/zh
Priority to US12/900,273 priority patent/US20110029555A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]

Definitions

  • Embodiments of the present invention relate to the field of communications technologies, and in particular, to a method, system, and apparatus for content identification. Background technique
  • CI Content Identification
  • Content recognition technology refers to the use of content feature extraction technology to extract feature values that distinguish one content from other content. First, calculate the feature value of the genuine content, and then calculate whether the content to be distributed on the website is the same content as the genuine content, or whether it is a genuine content. section. If so, the dissemination and use of the content being distributed is managed according to the copyright management rules of the genuine content.
  • the prior art also has a method for protecting the uploading of protected content by establishing a copyright management database of content through a centralized third party.
  • the Content Identifier Forum (CIDF) has developed an application framework for copyright protection. The main thing is to identify the content to be distributed with an identifier, and bind the identifier to related content attributes, such as attributes (such as size, type, etc.) of the content itself, the creator, and the like.
  • the scheme uses the traditional hash algorithm MD5 (Message Digest Algorithm 5) to calculate the feature values of the content for all types of content, as long as the content has When a bit changes, the verification and recognition of the content cannot be performed correctly, so using the scheme to detect the content is easily evaded by the detected content.
  • MD5 Message Digest Algorithm 5
  • Embodiments of the present invention provide a method, system, and apparatus for content identification, so as to accurately identify content according to different content types and selecting an appropriate identification method.
  • an embodiment of the present invention provides a method for content identification, including the following steps:: selecting a feature extraction algorithm to extract a content feature value according to a content type and a business entity for managing a content; and obtaining the feature value according to the content feature value The content attribute of the registered content.
  • the embodiment of the present invention further provides a content identification method, including: receiving a content attribute query request sent by a service entity, where the content attribute query request includes the content feature value and a query request type; Querying the content feature value carried in the request, searching for the feature value of the stored content that is closest to the content feature value in the content feature database of the content recognition management CIM entity; and finding the closest to the content feature value After storing the feature value of the content, carrying the query according to the content attribute query The query request type obtains the content attribute of the registered content, and returns the obtained content attribute to the business entity.
  • the embodiment of the present invention further provides a content registration method for content identification, including: receiving a content registration request sent by a registration entity; and selecting a feature generation algorithm according to the content type and the registration destination parameter carried by the content registration request Generating a content feature value; storing the generated content feature value and a content attribute of the content, and completing registration of the content submitted by the registration subject.
  • the embodiment of the present invention further provides a system for content identification, including: a service entity, configured to select a feature extraction algorithm to extract a content feature value according to a content type and a management purpose of the content of the service entity, according to the content Feature value acquisition content recognition management
  • a content identification management CIM entity configured to receive a content attribute query request sent by the service entity, where the content attribute query request includes the content feature value and a query request type, and query the content feature value carried in the request according to the content attribute Searching, in a content feature database of the CIM entity, a feature value of the stored content that is closest to the content feature value, and after finding a feature value of the stored content that is closest to the content feature value, according to The query request type carried in the content attribute query request acquires the content attribute of the registered content, and returns the obtained content attribute to the business entity.
  • the embodiment of the present invention further provides a service entity, including: an feature value extraction module, configured to select a feature extraction algorithm to extract a content feature value according to a content type and a management purpose of the content of the service entity; And acquiring content attributes of the registered content on the CIM entity according to the content feature value extracted by the feature value extraction module.
  • a service entity including: an feature value extraction module, configured to select a feature extraction algorithm to extract a content feature value according to a content type and a management purpose of the content of the service entity; And acquiring content attributes of the registered content on the CIM entity according to the content feature value extracted by the feature value extraction module.
  • an embodiment of the present invention further provides a content identification management CIM entity, including: a content registration module, configured to receive a content registration request sent by a registration entity, and register a content submitted by the registration entity; a content feature database, a content feature value and a content attribute for storing the registered content; the verification and query processing module is configured to receive a content attribute query request sent by the service entity, where the content attribute query request includes the content feature value and the query request type, according to Content features carried in the content attribute query request a value, in the content feature database of the CIM entity, searching for a feature value of the stored content that is closest to the content feature value, and after finding a feature value of the stored content that is closest to the content feature value, according to The query request type carried in the content attribute query request acquires the content attribute of the registered content, and returns the obtained content attribute to the business entity.
  • a content registration module configured to receive a content registration request sent by a registration entity, and register a content submitted by the registration entity
  • a content feature database a content
  • the service entity selects a feature extraction algorithm to extract the content feature value according to the content type of the received content and the management purpose of the content of the business entity, according to the extracted
  • the content feature value acquires the content attribute of the registered content, and manages the content of the business entity according to the obtained content attribute.
  • the embodiment of the present invention implements that the general CIM entity selects an appropriate identification method to accurately identify the content according to different content types, and registers the content submitted by the registration entity. After the business entity obtains the content attribute of the registered content, the The business entity manages the content of the business entity based on the content attribute.
  • FIG. 1 is a structural diagram of a system for content identification in an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for content identification in an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for content identification in another embodiment of the present invention.
  • FIG. 4 is a flowchart of content feature extraction and certificate generation in an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a format of a content certificate according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a verification process in an embodiment of the present invention.
  • Figure 8 is a structural diagram of a content identification management CIM entity 11 in an embodiment of the present invention
  • Figure 9 is a structural diagram of a service entity 12 in an embodiment of the present invention. detailed description
  • One embodiment of the present invention provides a method for content identification that can accurately identify various types of content.
  • the embodiment of the invention establishes a universal content recognition system, which can For the business entity to perform copyright management, content filtering, software verification and the like on the content involved in the business entity, and the system for identifying the content can be flexibly extended to other application fields.
  • the business entity refers to an entity that provides a specific service, such as a website, a user terminal, or a service gateway.
  • the embodiments of the present invention select the most suitable identification method to accurately identify the content according to the characteristics of different content types, improve the robustness, reduce the error rate, and reduce the burden and cost of the business entity.
  • a structural diagram of a system for content identification in an embodiment of the present invention includes: a CIM (Content Identification Manager) entity 11, a business entity 12, and a registration subject 13.
  • the CIM entity 11 can be maintained by a trusted third party, which is a trusted entity of the business entity 12, the user, and the registration entity 13, such as a telecommunications carrier, a government agency, and the like.
  • the CIM entity 11 is configured to register the content of the registration entity, and is responsible for generating and maintaining the content feature database, and receiving a content attribute query request of the business entity 12, where the content attribute query request includes a content verification and a content attribute query request, and provides the business entity 12 with Content feature query and verification, content verification, and content attribute query services.
  • the CIM entity 11 searches for the feature value of the stored content that is closest to the content feature value in the content feature database of the CIM entity 11 according to the content feature value carried in the content attribute query request, and finds the feature value with the content feature After the feature value of the stored content is obtained, the content attribute of the registered content is obtained according to the query request type carried in the content attribute query request, and the obtained content attribute is returned to the business entity 12 for the business entity 12 to serve The content of the entity 12 is managed.
  • the content attribute query request includes a content feature value, an attribute value generation algorithm, and a query request type.
  • the service entity 12 is configured to extract a content feature value according to the content type and the management object selection feature extraction algorithm of the content, and obtain the content attribute of the registered content on the CIM entity 11 according to the content feature value, and according to the acquired content attribute. Manage the content of business entity 12.
  • the registration entity 13 is configured to send a content registration request to the CIM entity 11, request the CIM entity 11 to register the content submitted by the registration body 13, and provide corresponding content attributes.
  • the feature values of the content are then generated by the CIM entity 11 and stored in the content feature database. Whether it's copyright protection, filtering, or software verification for anti-virus purposes, It is necessary to register the protected or filtered content to the CIM entity 11.
  • the registration subject 13 can be different depending on the purpose of the application, and the registration method can be different.
  • the registration subject 13 is generally a content provider or an individual content producer, and the manner of submitting the registration content may be any possible transmission method, such as FTP (Transfer Protocol), through the CIM entity.
  • FTP Transfer Protocol
  • the registration subject 13 can be a normal mobile phone user or an operator's customer service personnel, etc., and the registration content can be submitted by SMS (Short). Messaging Service, SMS, MMS, portal upload via CIM entity 11, or SOAP.
  • the information that the registration subject 13 must submit includes: the content itself and the parameters used to indicate the purpose of registration.
  • Registration purposes include, but are not limited to, copyright protection, filtering, or software verification.
  • the embodiment of the present invention selects different feature extraction algorithms and different software verification processing methods.
  • the optional submission information of the registration subject 13 includes:
  • the type and format of the content include but are not limited to: video, audio, text, software or hybrid.
  • the format of the content refers to the file format, for example: mp3
  • rmvb Real Media Variable Bitrate
  • copyright management rules refer to the management rules of how content users use and disseminate content. For example: You can upload and use unlimited ads and click on relevant ads. Copyright management rules can also be associated with specific users or business entities, such as allowing uploading on a website, allowing a user to download an item N times, and so on.
  • CIM entities 11 there may be multiple CIM entities 11 on the network. Different CIM entities 11 can be responsible for content identification management of a domain.
  • the registration subject 13 may not need to register an item with all CIM entities 11 and only need to register with a CIM entity 11 such as: CIM-A, and then synchronize the CIM-A to other CIM entities such as CIM-B to
  • CIM-A CIM entity 11
  • CIM-B CIM-B
  • the business entity 12 that verifies and queries the content using the CIM entity 11 may be a website, a gateway, a user terminal, or the like.
  • FIG. 2 it is a flowchart of a method for content identification in an embodiment of the present invention, which specifically includes the following steps:
  • Step S201 Select a feature extraction algorithm to extract the content feature value according to the content type and the business entity 12 for the management purpose of the content.
  • the algorithm used by the business entity 12 to extract the content feature value should be selected according to the type of the content and the management purpose of the content by the business entity.
  • the specific algorithm is the same as the feature extraction algorithm used by the CIM entity 11 in the registration process.
  • Step S202 Acquire a content attribute of the registered content according to the extracted content feature value.
  • the content attribute of the registered content can be:
  • the business entity 12 looks up the content certificate on the business entity 12 and obtains the content attribute based on the content certificate. Specifically, after the content certificate is found, the signature of the found content certificate is verified, and after the verification signature is successful, the content attribute is obtained from the content certificate.
  • the content attribute of the obtained registered content may also be:
  • the service entity 12 sends a content attribute query request to the CIM entity 11, and receives a content attribute returned by the CIM entity 11, the content attribute query request including a content feature value, a feature value generation algorithm, and a query request type.
  • the query request types include: upload content copyright verification, download content copyright verification, filtering management, and/or software verification.
  • the content attribute query request must also include the business entity identifier
  • the content attribute query request also includes software Name and version information; or,
  • the content attribute query request must also include the user identifier.
  • Step S203 Manage the content of the business entity 12 according to the acquired content attribute.
  • FIG. 3 it is a flowchart of a method for content identification in another embodiment of the present invention, which specifically includes the following steps:
  • Step S301 Receive a content attribute query request sent by the service entity 12, where the content attribute query request includes a content feature value and a query request type.
  • the query request type includes upload content copyright verification, download content copyright verification, filter management, and software verification.
  • Step S302 Search for the feature value of the stored content that is closest to the content feature value in the content feature database of the CIM entity 11 according to the content feature value carried in the content attribute query request.
  • Step S303 after finding the feature value of the stored content that is closest to the content feature value, acquiring the content attribute of the registered content according to the query request type carried in the content attribute query request, and returning the obtained content attribute to the business entity 12 For the business entity 12 to manage the content of the business entity 12.
  • the CIM entity 11 obtains the content attribute according to the query request type carried in the content attribute query request, which is specifically:
  • the content attribute query request must also include the service entity identifier, and the CIM entity 11 obtains the copyright management rule of the business entity 12 corresponding to the service entity identifier, and thus the CIM entity 11 returns
  • the content attribute further includes a copyright statement and a copyright management rule of the business entity 12 corresponding to the business entity identifier; or
  • the content attribute query request further includes a software name and version information.
  • the CIM entity 11 queries the content name and version information carried in the request according to the content attribute, and the content feature database of the CIM entity 11 Finding a feature value of the stored content that is closest to the content feature value, and after finding the feature value of the stored content that is closest to the content feature value, the CIM entity 11 queries the query request carried in the request according to the content attribute Type gets the content property and returns the obtained content property to the business real Body; or,
  • the content attribute query request must also include the user identifier.
  • the CIM entity 11 receives the content registration request sent by the registration body 13, and registers the content submitted by the registration body 13.
  • the CIM entity 11 can register the content submitted by the registration subject 13 and can be:
  • the CIM entity 11 selects a feature generation algorithm based on the content type and the registration destination parameter, generates a content feature value, and stores the generated content feature value and the content attribute of the content.
  • SHA Secure Hash Algorithm
  • MD5 Message Digest Algorithm 5
  • the content type is text and the purpose of registration is copyright protection, such content is generally novel, essay, news report, etc.
  • traditional hash algorithms such as SHA-1, SHA-265 or MD5 can be used.
  • the hash value of the content, and the calculated hash value is used as the feature value of the content.
  • the granularity of calculating the feature value of the content can be adjusted according to the strategy, and the hash value can be calculated for each piece or each sentence, and the array of hash values of each sentence or each sentence is used as the feature of the whole content. value.
  • the feature value of the content is extracted using a content-based hash algorithm.
  • the content-based hashing algorithm includes a content feature extraction algorithm and a robust hash algorithm.
  • the CIM entity 11 may further generate a content certificate for the registered content, the content certificate including a feature value of the registered content, a content attribute, a registration subject, and a method of verifying the authenticity of the content attribute.
  • the CIM entity 11 When there are a plurality of CIM entities on the network, the CIM entity 11 synchronizes the registration information of the content of the registration body 13 to other CIM entities other than the CIM entity 11, and then the registration information of the content is the registration entity by other CIM entities.
  • Content of 13 generates a content certificate, 11 Ways to verify the authenticity of content attributes.
  • the registration entity 13 is only registered with one CIM entity 11 without registering with each CIM entity to facilitate the business entity 12 of the domain to which the other CIM entity belongs to perform the content of the business entity 12 for the registered content. management.
  • the prior art uses audio recognition technology to block the uploading of protected content. First, a feature value of the copyright protection content and a copyright management rule database are created in the website.
  • the content uploaded by the user can be Manage.
  • a copyright management module extracts the feature value FP1 (Fingerprint) of the content by the same algorithm, and then searches the database for the stored information closest to the feature value.
  • the feature value FP2 if FP2 exists in the database, and the difference between FP2 and FP1 is less than a threshold, it can be considered that the content corresponding to FP1 belongs to the same content as the content corresponding to FP2.
  • the copyright management rules of the content in the application database are processed to process the content uploaded by the user.
  • the inventors of the present invention have found that the above prior art has the following disadvantages: Since the number of contents such as audio and video is large, the maintenance management content feature database greatly increases the cost of the website. And the website as a business entity can only use the database for aspects related to the business of the website, and cannot provide other applications. Some weaker websites do not have the ability to build such databases, and CP is not likely to negotiate implementation of this management scheme with all websites, so a large number of websites do not manage copyrighted content.
  • the present embodiment enables content identification by using one or more separate CIM entities 11 so that the business entities in the network can manage the content, and the content data for identification in the CIM entity 11 can be all The use of business entities reduces the burden on business entities.
  • the CIM entity 11 after receiving a content registration request, the CIM entity 11 generates a content feature value, and stores the content feature value and other data in the content feature database. in. Specifically, the following steps can be included:
  • Step S401 performing authentication on the registration body 13, and determining the content submitted by the registration body 13. The authenticity of the content attributes.
  • the CIM entity 11 After the registration subject 13 is authenticated, the CIM entity 11 registers the content for it. At the same time, the CIM entity 11 needs to verify the authenticity of the content attributes provided by the registration subject 13. For copyright protection, the ownership attribute 13 of the content attribute in the content attribute, the copyright claim is the part that needs to be verified most, and the registration subject 13 needs to present the legally valid proof of ownership, and the CIM entity 11 can accept the ownership attribute of the registered subject 13 to the content. . If the proof of ownership cannot be provided, the CIM entity 11 may reject the registration for copyright protection purposes. For the authenticity of other attributes such as content type, author, etc., after the CIM entity 11 successfully authenticates the registration subject 13, the attributes can be considered to be true, and the authenticity of the attributes can be determined after a certain manual verification.
  • Step S402 Generate feature values of the content according to the content type and the registration destination parameter. Before generating the feature value of the content based on the content type and the registration destination parameter, the CIM entity 11 selects the feature generation algorithm based on the content type and the registration destination parameter.
  • the content can be calculated by a traditional hash algorithm such as MD5 (Message Digest Algorithm 5)
  • MD5 Message Digest Algorithm 5
  • the hash value you can also extract the keyword of the text information, and then calculate the hash value of the keyword.
  • the calculated hash value is used as the feature value of the content.
  • the content type is text and the purpose of registration is copyright protection, such content is generally novel, essay, news report, etc.
  • a traditional hash algorithm such as: MD5 is used to calculate the hash value of the content, and The calculated hash value is used as the feature value of the content.
  • the granularity of calculating the feature value of the content can be adjusted according to the strategy, and the hash value can be calculated for each piece or each sentence, and the array of hash values of each sentence or each sentence is used as the feature of the whole content. value.
  • the feature value of the content is extracted using a content-based hash algorithm.
  • the content-based hashing algorithm includes a content feature extraction algorithm and a robust hash algorithm.
  • the CIM entity 11 assigns an ID (Identifier) to the content.
  • ID Identifier
  • the purpose of assigning an ID is to facilitate the organization of the content by the CIM entity 11, and to find content based on the ID, for example. Such as when performing software verification or advertising triggering. But when used for copyright protection or filtering,
  • the CIM entity 11 should look up the content based on the content trait value instead of the ID.
  • the IDs may be allocated according to the format of URN (Universal Resource Names).
  • step S404 may be further included, and the CIM entity 11 generates a content certificate for the content.
  • the content certificate may be appended to the head of the content file and propagated with the content; it may also be distributed separately, for example, periodically to the business entity 12.
  • the function of the content certificate is to make the subsequent content verification process unnecessary to connect to the CIM entity 11 , reduce the load of the CIM entity 11 and improve the efficiency of the verification.
  • the content certificate may include a content ID, a feature value of the content, a necessary attribute of the content, a registration subject 13, a manner in which the CIM entity 11 determines the authenticity of the content attribute, and a signature of the CIM entity 11 on the information.
  • the format of a content certificate is shown in Figure 5.
  • the format of the Verify means is as follows: CIMID-Auth-AttrVerify, the CIMID (CIM ID), the method (Auth) of the CIM authentication registration subject 13, and the authenticity of the authentication content attribute.
  • the method of Attr Verify (Attribute Verification) is used as a family of authentication methods. According to the specific authentication and verification methods, a specific verification method family needs to be defined.
  • the example can be defined as:
  • the CIMID In an actual content certificate, the CIMID needs to be set to the identity of the specific CIM entity 11.
  • PKI Public Key Infrastructure
  • Smartcard Smart Card
  • Kerberos Hell Keeper Authentication Protocol
  • the Owner EvidwithManual identifies the authenticity of the CIM entity 11 to verify the authenticity of the content attribute by requiring the registration subject 13 to provide legally valid proof of ownership and manually verifying the authenticity of other content attributes.
  • the Owner Evid logo automatically recognizes these other attributes after completing the authentication of the registration subject 13 except that the copyright notice is verified by the proof of ownership.
  • the property is real.
  • Step S405 the CIM entity 11 stores the content ID, the feature value of the content, and the content attribute in the content feature database.
  • step S406 the CIM entity 11 returns the registration result to the registration body 13.
  • a plurality of CIM entities exist in the network, and the foregoing registration process may further include: after the registration body 13 registers the content to a CIM entity 11 (hereinafter referred to as CIM-A), the CIM-A should register.
  • the information is synchronized to other CIM entities (the following is an example of CIM-B).
  • the registration entity 13 can be registered only to one CIM entity 11 without registering with other CIM entities to facilitate the business entity 12 of the domain to which the other C IM entity belongs to the registered content.
  • the content of the business entity 12 is managed.
  • the registration information that CIM-A needs to pass to CIM-B includes: content ID, feature value of content, content attribute and registration subject information submitted at registration, way of submitting registration content, and CIM-A judgment registration content And the method of authenticity of content attributes.
  • CIM-B After passing CIM-A's method of determining the authenticity of the content attribute to CIM-B, CIM-B can judge the authenticity of the content attribute according to the CIM-B's own rules and CIM-A's method of judging the authenticity of the content attribute.
  • CIM-A can pass these registration information to CIM-B in the following ways: FTP, SOAP, SHTTP (Secure HyperText Transfer Protocal, Secure Hypertext Transfer Protocol), etc., but need to use security mechanism to ensure the delivery process Information integrity in .
  • a TLS (Transport Layer Security Protocol) secure connection can be established between CIM-A and CIM-B.
  • CIM-A sends information about each registered content to CIM-B, which can be in a TLS.
  • the registration information of multiple contents is sent within the connection; the registration information of the content can also be sent as a certificate, because the CIM signature contained in the certificate can ensure the integrity of the information in the certificate, so no other security measures are needed.
  • CIM-B After receiving the registration information of each content delivered by CIM-A, CIM-B saves it in the database.
  • CIM-B generates a content certificate for the registration information of each content.
  • the CIMID should be set to CIM-A instead of CIM-B. In this way, the business entity 12 knows that the content is registered with the CIM-A when verifying the certificate, so that the business entity 12 can decide whether to trust the CIM-A according to the policy of the business entity 12.
  • the certificate generated by CIM-B contains the signature of the content registration information and adds the signature to the certificate. If CIM-A sends the content registration information to CIM-B in the form of a content certificate, CIM-B can remove the signature of CIM-A in the certificate and then add the signature of the CIM-B.
  • the purpose of the CIM-B signature in the content certificate is to facilitate the business entity 12 to verify the certificate. Because a CIM entity 11 is often responsible for a domain, the business entity 12 in the domain generally has the certificate of the CIM entity 11, but generally does not have the certificate of the other domain CIM entity, so the CIM entity of the other domain cannot be verified to register in the content certificate. The signature of the information. Therefore, after receiving the content registration information synchronized by other CIM entities, each CIM entity 11 needs to use its own certificate to sign the content registration information.
  • FIG. 6 is a schematic diagram of a verification process in an embodiment of the present invention
  • the service entity 12 needs to manage the operation of the content due to various application requirements.
  • the business entity 12 may be a website, a user terminal, or a filtering gateway.
  • Business Entity 12 Management of content operations includes, but is not limited to: copyright verification and propagation management, spam or illegal information filtering, or software verification for anti-virus purposes.
  • Step S601 the business entity 12 receives the content.
  • Step S602 the business entity 12 extracts the content feature value of the received content.
  • the algorithm for extracting the content feature value should be selected according to the type of the content and the management purpose of the content by the business entity 12, and the selected algorithm should be used when the CIM entity 11 registers the content submitted by the registration body 13 in the registration process.
  • the feature extraction algorithm is the same.
  • the criteria for selecting the content feature value extraction algorithm for the business entity 12 should be consistent with the criteria for the CIM entity 11 to select the content feature value extraction algorithm. For example: Both can be consistent by following a harmonized or pre-agreed approach.
  • Step S603 the business entity 12 sends a content verification and content attribute query request to the CIM entity 11.
  • the necessary parameters carried in the content attribute query request include: content feature value, Eigenvalue generation algorithm and query request type.
  • the optional parameters carried in the content attribute query request include other parameters required according to different query request types.
  • the query request type indicates that the business entity 12 requests the verification of the content and the content attribute query, and the CIM entity 11 performs corresponding processing according to this parameter.
  • the query request type includes but is not limited to:
  • Upload copyright validate refers to the management of the upload of content according to the copyright when the user uploads the content to the website
  • Download copyright validate refers to the management of the copyright and user download rights of the content when the user downloads the content from the Internet;
  • Anti-spam filtering refers to the filtering management of spam and illegal information
  • Software verification indicates software verification for anti-virus purposes.
  • the query request type is Upload copyright validate
  • the query request should also carry the ID of the business entity 12 (often a video sharing website) for the CIM entity 11 to find the copyright management information corresponding to the business entity 12.
  • the query request should also carry the ID of the user who downloaded the content, and the CIM entity 11 searches for the security management rule corresponding to the user.
  • the query request type is Software verification
  • the query request should also carry the software name, version number and other information.
  • Step S604 the CIM entity 11 searches for corresponding stored content information in the database.
  • the CIM entity 11 searches for the corresponding registered content information in the content feature database, it can be searched according to the feature value, or can be searched according to the content ID and/or the content name of the content. This is mainly due to the different application purposes, and the actual problems they target are also different.
  • the CIM entity 11 should retrieve the content feature database based on the feature values.
  • the name of the content in the scene generally does not exist in a malicious modification, or can be checked by the user or the business entity 11 for example: After the user downloads a software, the process of software verification is automatically triggered, because the user generally sees the name of the software and The version is downloaded by the user, and the software download site generally does not change the software name to spoof the user. Therefore, the problem to be solved by the software verification is mainly to prevent the software from being embedded in a virus or a malicious plug-in. For such content, the CIM entity 11 should retrieve the content feature database by content name or ID.
  • the content identifier may be a name of the content and auxiliary information such as: a version number, an ID assigned by the CIM entity 11 for the content, or a content name and a combination of the auxiliary information and the ID.
  • the feature value generation algorithm in the query request is a conventional hash algorithm such as MD5
  • the feature value of the content to be verified and the existing feature value are identical.
  • the feature value generating algorithm is a hash algorithm of the content, when the difference between the feature value of the content to be verified and the existing feature value is less than a threshold, it is regarded as the feature value of the same content.
  • the size of the threshold is determined by a specific algorithm.
  • the CIM entity 11 can look up again by the feature value, but whether to find the balance of the probability of success and the consumption should be considered again.
  • the scenario is usually triggered automatically after the user downloads the software, or the software is triggered by the service gateway when passing through the service gateway software, so the prompt can be prompted as "software is the required software", that is, the software name , version number, etc. are no problem. Therefore, if the search is completed by ID or name, but the corresponding stored content is not found, it is likely that the software has not been registered with the CIM entity. If the ID or name of the software has been changed, the ID will not be found. At this time, the FP check is used.
  • the CIM entity 11 concludes that the content itself is ok, but the name or ID has been changed, and it is meaningful to tell the conclusion to the user; If not, it means that other information is inserted in the content and the name has been changed, or the content is not registered, and the CIM entity cannot judge what the content is. Whether the CIM entity 11 informs the user of the above two results can be determined by the internal strategy of the CIM entity 11.
  • step S605 the CIM entity 11 obtains related attributes of the content according to the query type.
  • the query type is Upload copyright validate
  • the CIM entity 11 queries the content.
  • the copyright statement and the copyright management rule and finds whether the copyright management rule contains a specific management rule of the business entity 12 corresponding to the business entity ID carried in the query request.
  • the CIM entity 11 queries the copyright claim and the copyright management rule of the content, and searches whether the copyright management rule includes the specific management rule of the business entity 12 corresponding to the user ID carried in the query request.
  • the category and filtering requirements of the content are queried.
  • the CIM entity 11 When the query type is software verification, the CIM entity 11 does not search for the content according to the feature value in the query request in step S604, but searches for the corresponding feature value according to the name and version number of the software, and then compares the two feature values. . The comparison result is placed in the response message.
  • Step S606 the CIM entity 11 returns a response message to the service entity 12.
  • the response message contains the processing result code and associated content attributes.
  • the business entity 12 can perform related management operations on the content of the business entity 12 according to the returned content attribute.
  • the entity 11 queries that the CIM entity 11 can bring the content certificate of the content to the business entity 12 in the response message.
  • the service entity 12 may perform the process of acquiring the content attribute as shown in FIG. 7, specifically including the following steps:
  • Step S701 searching for a content certificate. If the content received by the business entity 12 carries a content certificate or the business entity 12 retrieves the corresponding content certificate locally by the content identifier, name or feature value, the business entity 12 can obtain the attribute of the content through the content certificate. If the content certificate is not found, step S603 is directly executed. If the content certificate is found, step S702 is performed.
  • Step S702 verifying the signature of the CIM entity 11. If it is the signature of the CIM entity 11 of the domain to which the service entity 12 belongs, or the service entity 12 can obtain the certificate of the CIM entity 11 that signs the content certificate, the signature of the CIM entity 11 can be verified with the CIM certificate. If the verification signature is successful, step S703 is performed; if the verification fails, step S603 is directly executed. Consistent.
  • the criterion for judging consistency is that the two eigenvalues are equal or the difference between the two eigenvalues is less than a threshold. If the two feature values are inconsistent, indicating that the content certificate carried by the content is not the content certificate of the content, the service entity 12 should connect to the CIM entity 11 for query, and step S603 is performed. If the two feature values are identical, step S704 is performed.
  • Step S704 Obtain a service attribute required for managing the content from the content certificate, and perform a corresponding management operation according to the acquired service attribute.
  • the business entity 12 If the operation of obtaining the content attribute by using the content certificate is successful, the business entity 12 is prevented from connecting to the CIM entity 11, thereby reducing the load of the CIM entity 11, and improving the efficiency of the business entity 12 acquiring the content attribute to perform the content management operation.
  • the above content identification method provides a basic service by the CIM entity 11 through the centralized CIM entity 11 and the synchronization between the CIM entities 11 of different domains, and provides common functions for copyright management, filtering, software verification, advertisement triggering, and the like.
  • the burden and cost of the business entity 12 is reduced.
  • the verification of the content by the business entity 12 does not have to be connected to the CIM entity 11 every time, which reduces the load of the CIM entity 11.
  • the CIM entity 11 selects the most suitable identification method to accurately identify the content according to the characteristics of different content types, and the service entity 12 manages the content of the business entity 12 according to the content attributes of different content, thereby improving the robustness. Sex, reducing the error rate.
  • a structure diagram of a content identification management CIM entity 11 in an embodiment of the present invention includes:
  • the content registration module 111 is configured to receive a content registration request sent by the registration body 13 to register the content submitted by the registration body 13. Registration of the contents of the registration subject 13 includes feature extraction, attribute authenticity verification, and content certificate generation.
  • the verification and query processing module 112 is configured to receive a content attribute query request sent by the service entity 12, where the content attribute query request includes a content feature value and a query request type, and query the content feature value carried in the request according to the content attribute, in the CIM Searching for the feature value of the stored content that is closest to the content feature value in the content feature database of the entity 11, and after searching for the feature value of the stored content that is closest to the content feature value, querying the request according to the content attribute
  • the query request type carried in the file obtains the content attribute of the registered content.
  • the obtained content attribute is returned to the business entity 12 for the business entity 12 to manage the content of the business entity 12.
  • the content feature database 113 is configured to save content feature values and content attributes of the registered content.
  • the content feature database 113 is responsible for storing content feature values, attributes, and certificates.
  • the specific storage form may be in the form of a database or may be stored in a file system, such as a file storage in an XML (Extensible Markup Language) format. .
  • the content registration module 111 includes: a feature value generation sub-module 1111, configured to select a feature generation algorithm according to the content type and the registration purpose parameter, generate a content feature value, and store the generated content feature value and the content attribute of the content to the content feature.
  • a feature value generation sub-module 1111 configured to select a feature generation algorithm according to the content type and the registration purpose parameter, generate a content feature value, and store the generated content feature value and the content attribute of the content to the content feature.
  • the content registration module 111 further includes: a certificate generation sub-module 1112, configured to generate a content certificate for the registered content, where the content certificate includes a feature value of the registered content, a content attribute, a registration body, and a method for verifying the authenticity of the content attribute. .
  • the verification and query processing module 112 includes:
  • the feature value finding sub-module 1121 is configured to search, in the content feature database 113, the feature value of the stored content that is closest to the content feature value according to the content feature value carried in the content attribute query request;
  • the attribute obtaining sub-module 1122 is configured to obtain the content attribute according to the query request type carried in the content attribute query request after the feature value finding sub-module 1121 finds the feature value of the stored content that is closest to the content feature value, and The obtained content attribute is returned to the business entity 12.
  • the CIM entity 11 further includes: a synchronization module 114, configured to synchronize registration information of the content to other CIM entities other than the CIM entity 11.
  • a structural diagram of a service entity 12 in an embodiment of the present invention includes: an feature value extraction module 121, configured to select a feature extraction algorithm to extract a content feature value according to a content type and a business entity 12 for management purposes of content ;
  • the attribute obtaining module 122 is configured to acquire the content attribute of the registered content on the CIM entity 11 according to the content feature value extracted by the feature value extraction module 121;
  • the content management module 123 is configured to manage the content of the business entity 12 according to the content attribute acquired by the attribute obtaining module 122.
  • the attribute obtaining module 122 may include: a lookup obtaining submodule 1221, configured to find a content certificate, and obtain the content attribute according to the found content certificate.
  • the attribute obtaining module 122 may include: a query obtaining sub-module, configured to send a content attribute query request to the CIM entity 11, and receive a content attribute returned by the CIM entity 11, where the content attribute query request includes The content feature value, the feature value generation algorithm, and the query request type.
  • the present invention can be implemented by hardware, or can be implemented by means of software plus necessary general hardware platform, and the technical solution of the present invention. It can be embodied in the form of a software product that can be stored in a non-volatile storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including a number of instructions for making a computer device (may It is a personal computer, a server, or a network device, etc.) that performs the methods described in various embodiments of the present invention.
  • a non-volatile storage medium which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • a computer device may It is a personal computer, a server, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Storage Device Security (AREA)
  • Information Transfer Between Computers (AREA)

Description

一种内容识别的方法、 系统和装置 本申请要求于 2008年 4月 7日提交中国专利局, 申请号为 200810089543.X, 发明名称为 "一种内容识别的方法、 系统和装置" 的中国专利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明实施例涉及通信技术领域, 特别涉及一种内容识别的方 法、 系统和装置。 背景技术
随着互联网的发展, 与多媒体内容的相关应用已经非常丰富, 例 如: 音乐、 软件或电影下载、 视频分享、 短信和彩信等。 与这些应用 伴随而来的版权问题、 垃圾和非法信息过滤问题, 以及病毒和恶意插 件问题也越来越严重。 这些问题都对内容的正确识别提出了要求, 通 过正确识别一项内容, 来判断该项内容是否版权保护内容, 是否为垃 圾信息等尤为重要。
现在, 音乐、 电影、 软件、 电子图书等电子内容的下载、 分享已 经是非常流行的互联网应用。 随着这些应用的发展, 在各个下载、 分 享网站上, 存在着大量侵犯版权的内容, 这些内容为网站吸引了大量 的用户。这些内容一般是 CP ( Content Provider, 内容提供商 )发行的, 未经允许免费自由地在网站上上载和传播, 这些内容使 CP的正当权 益受到极大的损害。
为了解决这类版权问题, 目前 CI ( Content Identification , 内容识 别技术)逐渐成为一些重要的内容提供商用于保护其内容版权的手 段。 内容识别技术是指利用内容特征提取技术, 提取一项内容区别于 其它内容的特征值。 首先计算得到正版内容的特征值, 再计算在网站 该被传播内容是否与正版内容是同一项内容,或是否是正版内容的一 部分。 如果是, 则根据正版内容的版权管理规则, 管理被传播内容的 传播和使用。
现有技术还有一种通过集中的第三方来建立内容的版权管理数 据库, 来屏蔽受保护内容的上传的方法, CIDF ( Content Identifier Forum, 内容标识论坛)制定了一个用于版权保护的应用框架, 主要 是釆用标识符标识需要分发的内容, 并将该标识符与相关的内容属 性, 例如内容本身所具备的属性(例如: 大小、 类型等)、 创建者的 等进行绑定。
但是, 本发明的发明人发现, 该技术存在以下缺点: 该方案对所 有类型的内容都使用传统的哈希算法 MD5 ( Message Digest Algorithm5 , 信息摘要算法 5 )来计算内容的特征值, 只要内容有一 个比特改变, 就不能正确完成内容的验证和识别, 所以釆用该方案检 测内容, 很容易被检测内容规避。 发明内容
本发明实施例提供一种内容识别的方法、 系统和装置, 以实现根 据不同的内容类型, 选择合适的识别方法准确地识别内容。
为达到上述目的, 本发明实施例一方面提供一种内容识别的方 法, 包括以下步骤: 根据内容类型和业务实体对内容的管理目的选择 特征提取算法提取内容特征值;根据所述内容特征值获取已注册内容 的内容属性。
另一方面, 本发明实施例还提供一种内容识别方法, 包括: 接收 业务实体发送的内容属性查询请求,所述内容属性查询请求包括所述 内容特征值和查询请求类型;根据所述内容属性查询请求中携带的内 容特征值 ,在内容识别管理 CIM实体的内容特征数据库中查找与所述 内容特征值最接近的已存内容的特征值;在查找到与所述内容特征值 最接近的已存内容的特征值之后,根据所述内容属性查询请求中携带 的查询请求类型获取已注册内容的内容属性,并将所述获取的内容属 性返回所述业务实体。
再一方面,本发明实施例还提供一种用于内容识别的内容注册方 法, 包括: 接收注册主体发送的内容注册请求; 根据所述内容注册请 求携带的内容类型和注册目的参数选择特征生成算法,生成内容特征 值; 存储所述生成的内容特征值和所述内容的内容属性, 完成对所述 注册主体提交的内容的注册。
再一方面, 本发明实施例还提供一种内容识别的系统, 包括: 业 务实体,用于根据内容类型和所述业务实体对内容的管理目的选择特 征提取算法提取内容特征值 ,根据所述内容特征值获取内容识别管理
CIM实体上的已注册内容的内容属性;
内容识别管理 CIM实体, 用于接收所述业务实体发送的内容属性 查询请求,所述内容属性查询请求包括所述内容特征值和查询请求类 型, 根据所述内容属性查询请求中携带的内容特征值, 在所述 CIM实 体的内容特征数据库中查找与所述内容特征值最接近的已存内容的 特征值,并在查找到与所述内容特征值最接近的已存内容的特征值之 后,根据所述内容属性查询请求中携带的查询请求类型获取已注册内 容的内容属性, 并将所述获取的内容属性返回所述业务实体。
再一方面, 本发明实施例还提供一种业务实体, 包括: 特征值提 取模块,用于根据内容类型和所述业务实体对内容的管理目的选择特 征提取算法提取内容特征值; 属性获取模块, 用于根据所述特征值提 取模块提取的内容特征值获取内容识别管理 CIM实体上的已注册内 容的内容属性。
再一方面, 本发明实施例还提供一种内容识别管理 CIM实体, 包 括: 内容注册模块, 用于接收注册主体发送的内容注册请求, 对所述 注册主体提交的内容进行注册; 内容特征数据库, 用于保存已注册内 容的内容特征值和内容属性; 验证和查询处理模块, 用于接收业务实 体发送的内容属性查询请求,所述内容属性查询请求包括所述内容特 征值和查询请求类型,根据所述内容属性查询请求中携带的内容特征 值,在所述 CIM实体的内容特征数据库中查找与所述内容特征值最接 近的已存内容的特征值,在查找到与所述内容特征值最接近的已存内 容的特征值之后 ,根据所述内容属性查询请求中携带的查询请求类型 获取已注册内容的内容属性,并将所述获取的内容属性返回所述业务 实体。
与现有技术相比, 本发明实施例具有以下优点: 通过本发明实施 例 ,业务实体根据所接收内容的内容类型和业务实体对内容的管理目 的选择特征提取算法提取内容特征值,根据提取的内容特征值获取已 注册内容的内容属性,并根据获取的内容属性对业务实体的内容进行 管理。 本发明实施例实现了由通用的 CIM实体根据不同的内容类型, 选择合适的识别方法准确地识别内容,对注册主体提交的内容进行注 册, 在业务实体获取到已注册内容的内容属性之后, 该业务实体根据 该内容属性对该业务实体的内容进行管理。 附图说明
图 1为本发明一个实施例中内容识别的系统的结构图;
图 2为本发明一个实施例中内容识别的方法的流程图;
图 3为本发明另一实施例中内容识别的方法的流程图;
图 4为本发明一个实施例中内容特征提取和证书生成的流程图; 图 5为本发明一个实施例中内容证书的格式的示意图;
图 6为本发明一个实施例中验证流程的示意图;
图 7为本发明一个实施例中获取内容属性的流程图;
图 8为本发明一个实施例中内容识别管理 CIM实体 11的结构图; 图 9为本发明一个实施例中业务实体 12的结构图。 具体实施方式
本发明一个实施例提供一种内容识别的方法,可以准确地识别各 种类型的内容。 本发明实施例建立了一个通用的内容识别的系统, 可 以用于业务实体对该业务实体涉及的内容进行版权管理、 内容过滤、 软件验证等操作 ,并且该内容识别的系统还可被灵活地扩展到其他应 用领域。 该业务实体指提供具体业务的实体, 例如: 网站、 用户终端 或业务网关等。 本发明实施例针对不同内容类型的特点, 选择最合适 的识别方法准确识别内容, 提高了鲁棒性, 减少了错误率, 也降低了 业务实体的负担和成本。
如图 1所示, 为本发明一个实施例中内容识别的系统的结构图, 包括: CIM ( Content Identification Manager, 内容识别管理)实体 11、 业务实体 12和注册主体 13。 在本发明一个实施例中, 可以通过可信第 三方维护该 CIM实体 11 , 该可信第三方为业务实体 12、 用户和注册实 体 13均信任的机构, 例如电信运营商、 政府机构等。 CIM实体 11用于 对注册主体的内容进行注册, 负责生成和维护内容特征数据库,接收 业务实体 12的内容属性查询请求,该内容属性查询请求包括内容验证 和内容属性查询请求, 为业务实体 12提供内容特征查询和验证、 内容 验证及内容属性查询等服务。 CIM实体 11根据内容属性查询请求中携 带的内容特征值 ,在 CIM实体 11的内容特征数据库中查找与内容特征 值最接近的已存内容的特征值,并在查找到与所述内容特征值最接近 的已存内容的特征值之后,根据该内容属性查询请求中携带的查询请 求类型获取已注册内容的内容属性,并将获取的内容属性返回所述业 务实体 12, 以供业务实体 12对业务实体 12的内容进行管理。 该内容属 性查询请求包括内容特征值、 特征值生成算法和查询请求类型。
业务实体 12用于根据内容类型和业务实体 12对内容的管理目的 选择特征提取算法提取内容特征值,根据该内容特征值获取 CIM实体 11上的已注册内容的内容属性,并根据获取的内容属性对业务实体 12 的内容进行管理。
其中, 注册主体 13用于向 CIM实体 11发送内容注册请求, 请求 CIM实体 11对该注册主体 13提交的内容进行注册, 并提供相应的内容 属性。 然后, 由 CIM实体 11生成内容的特征值, 并存入内容特征数据 库。无论是对内容做版权保护、过滤还是出于防病毒目的的软件验证, 都需要将受保护或要过滤的内容向 CIM实体 11进行注册。
根据应用目的的不同, 注册主体 13可以不同, 注册的方式也可以 不同。 例如对于版权保护和软件验证, 注册主体 13—般为内容提供商 或个体内容制造者, 提交注册内容的方式可以为任何可能的传输方 式, 例如 FTP ( Transfer Protocol, 文件传输协议) , 通过 CIM实体 11 的门户网站上传或 SOAP ( Simple Object Access Protocol, 简单对象访 问协议)等; 对于内容过滤, 注册主体 13可以是普通手机用户或运营 商的客服人员等, 提交注册内容的方式可以为 SMS ( Short Messaging Service ,短消息服务 )、彩信、通过 CIM实体 11的门户网站上传或 SOAP 等。
注册时, 注册主体 13必须提交的信息包括: 内容本身和用于表示 注册目的的参数。注册目的包括但不限于版权保护、过滤或软件验证。 根据该表示注册目的的参数, 本发明实施例选择不同的特征提取算 法, 以及不同的软件验证处理方法。
其中, 注册主体 13可选的提交信息包括:
( 1 ) 内容的类型和格式。 其中, 内容的类型包括但不限于: 视 频、 音频、 文本、 软件或混合。 内容的格式指文件格式, 例如: mp3
( Moving Picture Experts Group Audio Layer III,运动图像专家组晋频 层 3 )格式、 rmvb ( Real Media Variable Bitrate, 可变比特速率实媒体 ) 格式等。
( 2 ) 算法标识。 注册主体可以通过该参数表示提取特征的算法 的选择。
( 3 ) 当用于版权保护时, 须提交版权声明和版权管理规则。 其 中,版权管理规则指内容版权所有者对内容如何使用和传播的管理规 则。 例如: 可以为无限制使用和传播、 点击相关广告即可上传。 版权 管理规则也可以与具体的用户或业务实体相关联,例如允许在某网站 上上载、 允许某用户下载某项内容 N次等。
( 4 ) 当用于过滤时, 可以提交过滤原因说明。
( 5 )其它内容属性, 例如内容作者信息等。 在本发明的另一个实施方式中, 网络上可能存在多个 CIM实体 11。 不同的 CIM实体 11可以负责一个域的内容识别管理。 注册主体 13 可以不需要向所有 CIM实体 11——注册某项内容, 只需向一个 CIM实 体 11例如: CIM-A注册内容后, 由 CIM-A同步到其他 CIM实体例如: CIM-B, 以方便 CIM-B所属域的业务实体针对该已注册内容对所述业 务实体的内容进行管理,降低了注册主体 13和业务实体 12的负担和成 本。
另外, 使用 CIM实体 11对内容进行验证和查询的业务实体 12可以 为网站, 网关或用户终端等。
如图 2所示, 为本发明一个实施例中内容识别的方法的流程图, 具体包括以下步骤:
步骤 S201 , 根据内容类型和业务实体 12对内容的管理目的选择特 征提取算法提取内容特征值。业务实体 12提取内容特征值时釆用的算 法应该根据内容的类型和业务实体对内容的管理目的进行选择,具体 的算法与 CIM实体 11在注册流程中使用的特征提取算法相同。
步骤 S202, 根据提取的内容特征值获取已注册内容的内容属性。 获取已注册内容的内容属性具体可以为:
业务实体 12在该业务实体 12上查找内容证书, 根据内容证书获取 所述内容属性。 具体为: 在查找到内容证书之后, 验证查找到的内容 证书的签名, 在验证签名成功之后, 从该内容证书中获取内容属性。
获取已注册内容的内容属性具体还可以为:
业务实体 12向 CIM实体 11发送内容属性查询请求, 接收 CIM实体 11返回的内容属性, 该内容属性查询请求包括内容特征值、特征值生 成算法和查询请求类型。
该查询请求类型包括: 上传内容版权验证、 下载内容版权验证、 过滤管理和, 或软件验证。
当查询请求类型为上传内容版权验证时, 该内容属性查询请求还 须包括业务实体标识; 或者,
当查询请求类型为软件验证时, 该内容属性查询请求还包括软件 名称和版本信息; 或者,
当查询请求类型为下载内容版权验证时, 该内容属性查询请求还 须包括用户标识。
步骤 S203 , 根据获取的内容属性对业务实体 12的内容进行管理。 如图 3所示, 为本发明另一实施例中内容识别的方法的流程图, 具体包括以下步骤:
步骤 S301 , 接收业务实体 12发送的内容属性查询请求, 该内容属 性查询请求包括内容特征值和查询请求类型。该查询请求类型包括上 传内容版权验证、 下载内容版权验证、 过滤管理和软件验证。
步骤 S302, 根据内容属性查询请求中携带的内容特征值, 在 CIM 实体 11的内容特征数据库中查找与内容特征值最接近的已存内容的 特征值。
步骤 S303 , 在查找到与内容特征值最接近的已存内容的特征值之 后 ,根据内容属性查询请求中携带的查询请求类型获取已注册内容的 内容属性, 并将获取的内容属性返回业务实体 12 , 以供业务实体 12 对业务实体 12的内容进行管理。
CIM实体 11根据内容属性查询请求中携带的查询请求类型获取内 容属性具体为:
当查询请求类型为上传内容版权验证时, 该内容属性查询请求还 须包括业务实体标识,这时 CIM实体 11获取该业务实体标识对应的业 务实体 12的版权管理规则, 因此该 CIM实体 11返回的内容属性还包括 版权声明和该业务实体标识对应的业务实体 12的版权管理规则; 或 者,
当查询请求类型为软件验证时, 该内容属性查询请求还包括软件 名称和版本信息,这时 CIM实体 11根据该内容属性查询请求中携带的 软件名称和版本信息,在 CIM实体 11的内容特征数据库中查找与内容 特征值最接近的已存内容的特征值,在查找到与所述内容特征值最接 近的已存内容的特征值之后,该 CIM实体 11根据内容属性查询请求中 携带的查询请求类型获取内容属性,并将获取的内容属性返回业务实 体; 或者,
当查询请求类型为下载内容版权验证时, 该内容属性查询请求还 须包括用户标识。
在注册流程中, CIM实体 11接收注册主体 13发送的内容注册请求, 对注册主体 13提交的内容进行注册。
CIM实体 11对注册主体 13提交的内容进行注册具体可以为:
CIM实体 11根据内容类型和注册目的参数选择特征生成算法, 生 成内容特征值, 并存储生成的内容特征值和该内容的内容属性。
例如, 如果内容类型为文本, 并且注册目的是过滤, 这类内容一 般为短消息、 电子邮件或彩信等, 则可以用传统哈希算法例如: SHA ( Secure Hash Algorithm , 安全哈希算法) -1、 SHA-265或 MD5 ( Message Digest Algorithm5 , 信息摘要算法 5 )计算内容的哈希值, 也可以先提取文本信息的关键字, 再计算关键字的哈希值。 并以计算 得到的哈希值作为内容的特征值。
例如, 如果内容类型为文本, 并且注册目的为版权保护, 这类内 容一般为小说、 散文、 新闻报道等内容, 这时可以釆用传统哈希算法 例如: SHA-1、 SHA-265或 MD5计算内容的哈希值, 并以计算得到的 哈希值作为内容的特征值。 计算内容特征值的粒度可以根据策略调 整, 可以针对整篇内容, 也可以对每段或每句分别计算哈希值, 将每 段或每句的哈希值组成的数组作为整篇内容的特征值。
例如, 如果内容类型为图片、 音频或视频, 则用基于内容的哈希 算法提取内容的特征值。该基于内容的哈希算法包括内容特征提取算 法和鲁棒性哈希算法。
该 CIM实体 11进一步还可以为注册的内容生成内容证书, 该内容 证书包括注册内容的特征值、 内容属性、 注册主体和验证所述内容属 性的真实性的方法。
当网络上存在多个 CIM实体时, CIM实体 11将注册主体 13的内容 的注册信息同步到除该 CIM实体 11之外的其他 CIM实体, 然后由其他 CIM实体根据内容的注册信息为该注册主体 13的内容生成内容证书, 11验证内容属性的真实性的方法。 从而实现了注册主体 13只向一个 CIM实体 11注册,而不需要向各个 CIM实体——注册,以方便其他 CIM 实体所属域的业务实体 12针对该已注册内容对所述业务实体 12的内 容进行管理。 有以下有益效果:现有技术釆用音频识别技术来屏蔽受保护内容的上 传,首先在网站内建立一个版权保护内容的特征值和版权管理规则数 据库, 数据库建好之后, 可以对用户上传的内容进行管理。 当用户通 过其设备上传一项内容到网站时,由一个版权管理模块按同样的算法 提取该内容的特征值 FPl ( Fingerprint, 特征值), 然后向数据库中搜 索与该特征值最接近的已存特征值 FP2 , 如果数据库中存在 FP2, 并 且 FP2和 FP1的差别小于一个阔值, 则可认为 FP1对应的内容与 FP2对 应的内容属于同一内容。此时应用数据库中该内容的版权管理规则来 处理用户上传的内容。
但是, 本发明的发明人发现, 上述现有技术存在以下缺点: 由于 音视频等内容数量巨大, 因此维护管理内容特征数据库大大增加了网 站的成本。 并且网站作为业务实体, 只能将该数据库用于与该网站业 务有关的方面, 而无法提供其它应用。 一些实力较弱的网站没有能力 建设这类数据库, CP也不可能跟所有的网站都协商实施该管理方案, 因此大量网站未对版权内容 ^1管理。 然而, 本实施方式通过釆用一个 或一个以上的单独的 CIM实体 11进行内容识别以便于网络中的业务 实体都能能够对内容进行管理, CIM实体 11中的用于识别的内容数据 能够为所有业务实体使用, 减少了业务实体的负担。
如图 4所示, 为本发明一个实施例中内容特征提取和证书生成的 流程图, CIM实体 11收到内容注册请求后, 生成内容特征值, 将内容 特征值和其它数据存储于内容特征数据库中。具体可以包括以下几个 步骤:
步骤 S401 , 对注册主体 13进行认证, 确定注册主体 13提交的内容 的内容属性的真实性。
在注册主体 13认证通过后, CIM实体 11才为其进行内容注册。 同 时, CIM实体 11需要验证注册主体 13提供的内容属性的真实性。 对于 版权保护, 内容属性中注册主体 13对内容的所有权属性、版权声明是 最需要验证的部分, 注册主体 13需要出示具有法律效力的所有权证 明, CIM实体 11才能接受注册主体 13对内容的所有权属性。 如果不能 提供所有权证明, 则 CIM实体 11可以拒绝用于版权保护目的的注册。 对于其他属性如内容类型、 作者等的真实性, 在 CIM实体 11对注册主 体 13认证成功后就可以认为这些属性是真实的,也可以在经过一定的 人工验证后, 确定这些属性的真实性。
步骤 S402, 根据内容类型和注册目的参数生成内容的特征值。 在根据内容类型和注册目的参数生成内容的特征值之前, CIM实 体 11根据内容类型和注册目的参数选择特征生成算法。
例如, 如果内容类型为文本, 并且注册目的是过滤, 这类内容一 般为短消息、 电子邮件或彩信等, 则可以用传统哈希算法例如: MD5 ( Message Digest Algorithm5 , 信息摘要算法 5 )计算内容的哈希值, 也可以先提取文本信息的关键字, 再计算关键字的哈希值。 并以计算 得到的哈希值作为内容的特征值。
例如, 如果内容类型为文本, 并且注册目的为版权保护, 这类内 容一般为小说、 散文、 新闻报道等内容, 这时可以釆用传统哈希算法 例如: MD5计算内容的哈希值, 并以计算得到的哈希值作为内容的特 征值。计算内容特征值的粒度可以根据策略调整,可以针对整篇内容, 也可以对每段或每句分别计算哈希值,将每段或每句的哈希值组成的 数组作为整篇内容的特征值。
例如, 如果内容类型为图片、 音频或视频, 则用基于内容的哈希 算法提取内容的特征值。该基于内容的哈希算法包括内容特征提取算 法和鲁棒性哈希算法。
步骤 S403 , CIM实体 11为内容分配 ID ( Identifier, 标识) 。 分配 ID的目的是方便 CIM实体 11对内容进行组织, 和基于 ID查找内容, 例 如在进行软件验证或广告触发时。 但是在用于版权保护或过滤时,
CIM实体 11应依据内容特征值而不是 ID来查找内容。
在本发明实施例中, ID的分配可以按照 URN ( Universal Resource Names, 统一资源名) 的格式。
另一实施方式中, 还可以包括步骤 S404 , CIM实体 11为内容生成 内容证书。 该内容证书生成后可以附加在内容文件的头部, 和内容一 起传播; 也可以单独分发, 例如定期同步到业务实体 12中。 内容证书 的作用是使后续的内容验证过程可以不必连接 CIM实体 11 , 降低 CIM 实体 11的负荷, 提高验证的效率。
内容证书中可以包括内容 ID、 内容的特征值、 内容的必要属性、 注册主体 13、 CIM实体 11判断内容属性真实性的方式和 CIM实体 11对 这些信息的签名。 一个内容证书的格式如图 5所示, 其中 Verify means (验证方法)格式如下: CIMID-Auth-AttrVerify, 即将 CIMID ( CIM 标识) 、 CIM认证注册主体 13的方法(Auth )和验证内容属性真实性 的方法 Attr Verify ( Attribute Verify, 属性验证) 的联合体作为验证方 法族。 根据具体的认证和验证方法, 需要定义具体的验证方法族, 例 口可以定义为:
CIMID-PKI-OwnerEvidwithManual
CIMID-Smartcard-OwnerEvidwithManual
CIMID-Kerberos-OwnerEvid
以上仅是定义具体的验证方法族的几种示例, 本发明实施例并不 局限于此。
在一个实际的内容证书中, CIMID需要设为具体的 CIM实体 11的 标识。 PKI ( Public Key Infrastructure,公钥基础设施 )、 Smartcard (智 能卡) 、 Kerberos (地狱守护者认证协议)分别标识三种认证方式。 OwnerEvidwithManual标识 CIM实体 11验证内容属性真实性的方法是 要求注册主体 13提供了具有法律效力的所有权证明,并手工验证其他 内容属性的真实性。 OwnerEvid标识除了版权声明是通过所有权证明 验证之外,其他属性是完成对注册主体 13的认证后自动认为这些其它 属性是真实的。
步骤 S405, CIM实体 11将上述内容 ID、 内容的特征值和内容属性 存入内容特征数据库。
步骤 S406, CIM实体 11将注册结果返回给注册主体 13。
在另一个实施方式中, 网络中存在多个 CIM实体, 上述注册的过 程还可以包括: 注册主体 13将内容向一个 CIM实体 11 (以下以 CIM-A 表示)注册后, CIM-A应该将注册信息同步到其他 CIM实体(以下以 CIM-B为例进行说明)。 这个具体实施方式中, 可以实现注册主体 13 只向一个 CIM实体 11注册, 而不需要向其他各个 CIM实体——注册, 以方便其他 C IM实体所属域的业务实体 12针对该已注册内容对所述 业务实体 12的内容进行管理。
在同步流程中, CIM-A需要向 CIM-B传递的注册信息包括: 内容 ID、 内容的特征值、 注册时提交的内容属性和注册主体信息、 提交注 册内容的方式和 CIM-A判断注册内容以及内容属性真实性的方法。在 将 CIM-A判断内容属性真实性的方法传递给 CIM-B之后, CIM-B可以 根据该 CIM-B自身的规则, 以及 CIM-A判断内容属性真实性的方法判 断内容属性的真实性。
CIM-A将这些注册信息传递给 CIM-B的方式可以釆用以下多种方 式: FTP, SOAP, SHTTP ( Secure HyperText Transfer Protocal, 安全 超文本转换协议)等, 但需要釆用安全机制保证传递过程中的信息完 整性。 例如可以在 CIM-A和 CIM-B之间建立 TLS ( Transport Layer Security Protocol, 安全传输层协议)安全连接, 由 CIM-A将每项注册 内容的相关信息发送给 CIM-B, 可以在一个 TLS连接内发送多项内容 的注册信息; 也可以将内容的注册信息以证书的方式发送, 因为证书 包含的 CIM签名可以保证证书内信息的完整性, 所以不需要其他安全 措施。
CIM-B在收到 CIM-A传递的各项内容的注册信息后, 将其保存在 数据库中。
CIM-B为每项内容的注册信息生成内容证书。 在生成的内容证书中的 Verify Means字段中, 应将 CIMID设为 CIM-A而不是 CIM-B。 这样业务实体 12在验证证书时知道该内容是向 CIM-A注册的,从而业务实体 12可以根据该业务实体 12的自身策略决 定是否信任 CIM-A。
CIM-B生成的证书包含内容注册信息的签名,并将签名加入证书。 如果 CIM-A是以内容证书的形式将内容注册信息发给 CIM-B , 则 CIM-B可以将证书中 CIM-A的签名去掉, 然后加上该 CIM-B的签名。 内容证书中需要 CIM-B的签名的目的是方便业务实体 12对证书进行 验证。 因为一个 CIM实体 11往往负责一个域, 该域内的业务实体 12— 般拥有该 CIM实体 11的证书, 但一般不具有其他域 CIM实体的证书, 所以无法验证其他域的 CIM实体对内容证书内注册信息所做的签名。 所以每个 CIM实体 11收到其他 CIM实体同步过来的内容注册信息后, 需要利用自己的证书对内容注册信息做签名。
如图 6所示, 为本发明一个实施例中验证流程的示意图, 业务实 体 12在收到内容后,由于各种应用需求,需要对内容的操作进行管理。 该业务实体 12可能为网站、 用户终端或过滤网关等。 业务实体 12对内 容操作的管理包括但不限于: 版权验证和传播管理、垃圾或非法信息 过滤或处于防病毒目的的软件验证。
具体包括以下几个步骤:
步骤 S601 , 业务实体 12接收内容。
步骤 S602, 业务实体 12提取接收的内容的内容特征值。 其中, 提 取内容特征值的算法应该根据内容的类型和业务实体 12对内容的管 理目的进行选择,选择的算法应与 CIM实体 11在注册流程中对注册主 体 13提交的内容进行注册时釆用的特征提取算法相同。
业务实体 12选择内容特征值提取算法的标准应该与 CIM实体 11选 择内容特征值提取算法的标准一致。 例如: 二者可以通过遵从统一标 准或提前约定的方法保持一致。
步骤 S603 , 业务实体 12向 CIM实体 11发送内容验证和内容属性查 询请求。 该内容属性查询请求中携带的必要参数包括: 内容特征值、 特征值生成算法和查询请求类型。该内容属性查询请求中携带的可选 的参数包括根据不同查询请求类型所需要的其他参数。
查询请求类型表示业务实体 12请求对内容进行验证和内容属性 查询的目的, CIM实体 11依据此参数进行相应处理。 查询请求类型包 括但不限于:
Upload copyright validate (上传版权险证 )指用户上传内容到网站 时根据版权对内容的上传进行管理;
Download copyright validate (下载版权验证)指用户从网上下载 内容时, 对内容的版权和用户下载权限进行管理;
Anti-spam filtering (垃圾信息过滤)指对垃圾信息和非法信息进 行过滤管理;
Software verification (软件验证 )指出于防病毒目的的软件验证。 当查询请求类型为 Upload copyright validate时,查询请求还应携带 业务实体 12 (往往是视频分享网站)的 ID, 用于 CIM实体 11查找与该 业务实体 12对应的版权管理信息。
当查询请求类型为 Download copyright validate时, 查询请求还应 携带下载该内容的用户的 ID,用于 CIM实体 11查找与该用户对应的安 全管理规则。
当查询请求类型为 Software verification时 ,查询请求还应携带软件 的名称、 版本号等信息。
步骤 S604 , CIM实体 11在数据库中查找对应的已存注册内容信息。
CIM实体 11在内容特征数据库中查找对应的已存注册内容信息时 可以根据特征值来查找, 也可以根据内容的标识 ID和 /或内容名称来 查找。 这主要是由于应用目的不同, 其针对的实际问题也不同。
当需要对内容进行版权保护和过滤时, 只要其视觉可感知的部分 中, 关键特征与受保护内容一致, 就可以认为其侵权或属于需过滤信 息。 而这些内容为了规避管理, 其内容名称和 ID被修改是普遍情况。 所以对这类内容, CIM实体 11应该根据特征值检索内容特征数据库。
对于软件验证和其他一些应用例如: 广告触发等, 因为其应用场 景中内容的名称一般不存在被恶意修改的情况,或者可以被用户或业 务实体 11检查到例如: 用户下载一个软件后, 自动触发软件验证的流 程, 因为用户一般都是看到软件的名称和版本是自己需要的才会下 载, 而软件下载网站一般也不会更改软件名称欺骗用户, 所以软件验 证要解决的问题主要是防止软件被嵌入病毒或恶意插件。对于这类内 容, CIM实体 11应该按照内容名称或 ID检索内容特征数据库。
在本发明实施例中, 才艮据内容标识查询, 该内容标识可以是内容 的名称及辅助信息例如: 版本号、 CIM实体 11为内容分配的 ID、 或内 容名称及辅助信息和 ID的组合。
当根据特征值检索内容特征数据库时, 如果查询请求中的特征值 生成算法是传统哈希算法如 MD5 ,则要求待验证内容的特征值和已存 的特征值完全相同。 如果特征值生成算法^^于内容的哈希算法, 则 当待验证内容的特征值和已存的特征值的差值小于一个阔值时 ,就认 为是同一内容的特征值。 阔值大小由具体的算法决定。
当根据内容标识检索内容特征数据库时, 如果查找失败, CIM实 体 11可以按特征值再查找一遍,但是否再次查找应考虑成功的概率和 消耗的平衡。对于软件验证, 其场景一般是用户下载完软件后自动触 发, 或者软件在经过业务网关软件时, 由业务网关触发, 所以这时可 以提示为"软件就是那个需要的软件", 也就是说软件名称、 版本号等 都没问题。 所以如果按 ID或名称查找完毕,但没查到相应的已存内容 时, 很可能是软件没在 CIM实体注册过。 如果该软件的 ID或名称被更 改过, 那么使用 ID会查不到。 这时候再用 FP查, 如果查找到相应的已 存内容, 那么 CIM实体 11得到的结论是内容本身没问题, 但名称或 ID 被更改过, 把这个结论告诉用户还是有意义的; 如果还是查不到, 则 说明内容中被插入了其他信息并且名字被更改过, 或内容未被注册, CIM实体无法判断该内容到底是什么。 CIM实体 11是否将上述两种结 果告知用户, 可由 CIM实体 11的内部策略决定。
步骤 S605, CIM实体 11根据查询类型, 获得内容的相关属性。 当查询类型为 Upload copyright validate时, CIM实体 11查询内容的 版权声明和版权管理规则,并查找版权管理规则中是否包含与查询请 求中携带的业务实体 ID对应的业务实体 12的特定管理规则。
当查询类型为 Download copyright validate时, CIM实体 11查询内 容的版权声明和版权管理规则,并查找版权管理规则中是否包含与查 询请求中携带的用户 ID对应的业务实体 12的特定管理规则。
当查询类型为 anti-spam filtering时, 则查询内容的类别和过滤要 求。
当查询类型为 software verification时,步骤 S604中 CIM实体 11不根 据查询请求中的特征值查找内容, 而是根据 software (软件) 的名称 和版本号查找对应的特征值, 然后将两个特征值对比。 并将对比结果 放入响应消息中。
步骤 S606, CIM实体 11返回响应消息给业务实体 12。 响应消息中 包含处理结果代码和相关的内容属性。业务实体 12可以根据返回的内 容属性对该业务实体 12的内容进行相关管理操作。 实体 11查询, CIM实体 11可以在响应消息中将该内容的内容证书带 上, 发给业务实体 12。
在本发明的另一实施方式中, 在步骤 S602之后, 业务实体 12可以 执行如图 7所示的流程获取内容属性, 具体包括以下几个步骤:
步骤 S701 , 查找内容证书。 如果业务实体 12接收的内容携带了内 容证书或业务实体 12通过内容标识、名称或特征值在本地检索到了对 应的内容证书, 则业务实体 12可以通过该内容证书来获得内容的属 性。 如果没有查找到内容证书, 则直接执行步骤 S603。 如果查到内容 证书, 则执行步骤 S702。
步骤 S702, 验证 CIM实体 11的签名。 如果是业务实体 12所属域的 CIM实体 11的签名, 或该业务实体 12能够获得对内容证书做签名的 CIM实体 11的证书, 则可用 CIM证书验证 CIM实体 11的签名。 如果验 证签名成功,则执行步骤 S703;如果验证失败,则直接执行步骤 S603。 一致。
判断一致的标准是这两个特征值相等或这两个特征值的差值小 于一个阔值。 如果这两个特征值不一致, 表示该内容所携带的内容证 书不是该内容的内容证书, 则业务实体 12应该连接 CIM实体 11进行查 询, 执行步骤 S603。 如果这两个特征值一致, 则执行步骤 S704。
步骤 S704 , 从内容证书中获取对内容进行管理需要的业务属性, 并根据获取的业务属性执行相应的管理操作。
上述通过内容证书来获取内容属性的操作如果成功, 则避免了业 务实体 12连接 CIM实体 11 , 从而降低了 CIM实体 11的负荷, 也提高了 业务实体 12获取内容属性执行内容管理操作的效率。
上述内容识别的方法, 通过集中式的 CIM实体 11 , 以及不同域的 CIM实体 11之间的同步, 由 CIM实体 11提供基础服务, 为版权管理、 过滤、 软件验证、 广告触发等提供通用功能, 降低了业务实体 12的负 担和成本。 通过内容证书机制, 使业务实体 12对内容的验证不必每次 都连接 CIM实体 11 , 降低了 CIM实体 11的负荷。
并且本发明实施例中, CIM实体 11针对不同内容类型的特点, 选 择最合适的识别方法准确识别内容,业务实体 12根据不同内容的内容 属性对该业务实体 12的内容进行管理,提高了鲁棒性、减少了错误率。
如图 8所示,为本发明一个实施例中内容识别管理 CIM实体 11的结 构图, 包括:
内容注册模块 111 , 用于接收注册主体 13发送的内容注册请求, 对注册主体 13提交的内容进行注册。对注册主体 13的内容进行注册包 括特征提取、 属性真实性检验和内容证书生成。
验证和查询处理模块 112 , 用于接收业务实体 12发送的内容属性 查询请求, 该内容属性查询请求包括内容特征值和查询请求类型, 根 据所述内容属性查询请求中携带的内容特征值,在 CIM实体 11的内容 特征数据库中查找与所述内容特征值最接近的已存内容的特征值,在 查找到与所述内容特征值最接近的已存内容的特征值之后,根据该内 容属性查询请求中携带的查询请求类型获取已注册内容的内容属性, 并将获取的内容属性返回业务实体 12, 以供业务实体 12对业务实体 12 的内容进行管理。
内容特征数据库 113 , 用于保存已注册内容的内容特征值和内容 属性。 内容特征数据库 113负责存储内容特征值、 属性和证书, 具体 的存储形式可以是数据库的形式, 也可以存储于文件系统中, 如以 XML ( Extensible Markup Language, 可扩展置标语言)格式的文件 存储。
其中, 内容注册模块 111包括: 特征值生成子模块 1111 , 用于根 据内容类型和注册目的参数选择特征生成算法, 生成内容特征值, 并 将生成的内容特征值和内容的内容属性存储到内容特征数据库 113 中。
其中, 内容注册模块 111还包括: 证书生成子模块 1112, 用于为 注册内容生成内容证书, 该内容证书包括注册内容的特征值、 内容属 性、 注册主体和验证所述内容属性的真实性的方法。
其中, 验证和查询处理模块 112包括:
特征值查找子模块 1121 , 用于根据内容属性查询请求中携带的内 容特征值, 在内容特征数据库 113中查找与所述内容特征值最接近的 已存内容的特征值;
属性获取子模块 1122, 用于在特征值查找子模块 1121查找到与所 述内容特征值最接近的已存内容的特征值之后,根据内容属性查询请 求中携带的查询请求类型获取内容属性 ,并将获取的内容属性返回业 务实体 12。
CIM实体 11还包括: 同步模块 114,用于将内容的注册信息同步到 除所述 CIM实体 11之外的其他 CIM实体。
如图 9所示, 为本发明一个实施例中业务实体 12的结构图, 包括: 特征值提取模块 121 , 用于根据内容类型和业务实体 12对内容的 管理目的选择特征提取算法提取内容特征值;
属性获取模块 122 , 用于根据特征值提取模块 121提取的内容特征 值获取 CIM实体 11上的已注册内容的内容属性; 内容管理模块 123 , 用于根据属性获取模块 122获取的内容属性对 业务实体 12的内容进行管理。
其中, 属性获取模块 122可以包括: 查找获取子模块 1221 , 用于 查找内容证书, 根据查找到的内容证书获取所述内容属性。
在本发明的另一实施例中, 属性获取模块 122可以包括: 查询获 取子模块, 用于向 CIM实体 11发送内容属性查询请求, 接收该 CIM实 体 11返回的内容属性, 该内容属性查询请求包括所述内容特征值、特 征值生成算法和查询请求类型。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解 到本发明可以通过硬件实现,也可以可借助软件加必要的通用硬件平 台的方式来实现基于这样的理解,本发明的技术方案可以以软件产品 的形式体现出来, 该软件产品可以存储在一个非易失性存储介质(可 以是 CD-ROM, U盘, 移动硬盘等) 中, 包括若干指令用以使得一 台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行 本发明各个实施例所述的方法。
总之, 以上所述仅为本发明的较佳实施例而已, 并非用于限定本 发明的保护范围。 凡在本发明的精神和原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。

Claims

权利要求
1、 一种内容识别的方法, 其特征在于, 包括以下步骤: 根据内容类型和业务实体对内容的管理目的选择特征提取算法 提取内容特征值;
根据所述内容特征值获取已注册内容的内容属性。
2、 如权利要求 1所述内容识别的方法, 其特征在于, 根据所述 内容特征值获取已注册内容的内容属性, 之后还包括:
根据所述获取的内容属性对所述业务实体的内容进行管理。
3、 如权利要求 1所述内容识别的方法, 其特征在于, 所述根据 内容类型和业务实体对内容的管理目的选择特征提取算法提取内容 特征值包括:
如果所述内容类型为文本,并且所述业务实体对内容的管理目的 为过滤, 则釆用传统哈希算法计算所述内容的哈希值, 或者先提取所 述文本内容的关键字, 再计算所述关键字的哈希值, 以所述计算得到 的哈希值作为所述内容特征值; 或者,
如果所述内容类型为文本,并且所述业务实体对内容的管理目的 为版权保护, 则釆用传统哈希算法计算所述内容的哈希值, 并以所述 计算得到的哈希值作为所述内容的特征值; 或者,
如果所述内容类型为图片、 音频或视频, 则釆用基于内容的哈希 算法提取所述内容特征值,所述基于内容的哈希算法包括内容特征提 取算法和鲁棒性哈希算法。
4、 如权利要求 1所述内容识别的方法, 其特征在于, 所述根据 内容特征值获取已注册内容的内容属性包括:
查找内容证书, 根据所述内容证书获取所述内容属性; 或者, 向内容识别管理 CIM实体发送内容属性查询请求 ,接收所述 CIM 实体返回的内容属性,所述内容属性查询请求包括所述内容特征值和 查询请求类型。
5、 如权利要求 4所述内容识别的方法, 其特征在于, 所述查找 内容证书, 根据所述内容证书获取所述内容属性包括: 在查找到内容证书之后, 验证所述查找到的内容证书的签名, 在 验证签名成功之后, 从所述内容证书中获取所述内容属性。
6、 如权利要求 4所述内容识别的方法, 其特征在于, 所述查询 请求类型包括: 上传内容版权验证、 下载内容版权验证、 过滤管理和 软件验证。
7、 如权利要求 6所述内容识别的方法, 其特征在于, 当所述查 询请求类型为上传内容版权验证时,所述内容属性查询请求还包括业 务实体标识; 或者,
当所述查询请求类型为软件验证时,所述内容属性查询请求还包 括软件名称和版本信息; 或者,
当所述查询请求类型为下载内容版权验证时,所述内容属性查询 请求还包括用户标识。
8、 如权利要求 1所述内容实别的方法, 其特征在于, 所述已注 册内容是在 CIM实体上注册的。
9、 一种内容识别方法, 其特征在于, 包括:
接收业务实体发送的内容属性查询请求,所述内容属性查询请求 包括所述内容特征值和查询请求类型;
根据所述内容属性查询请求中携带的内容特征值 ,在内容识别管 理 CIM实体的内容特征数据库中查找与所述内容特征值最接近的已 存内容的特征值;
在查找到与所述内容特征值最接近的已存内容的特征值之后,根 据所述内容属性查询请求中携带的查询请求类型获取已注册内容的 内容属性, 并将所述获取的内容属性返回所述业务实体。
10、 如权利要求 9所述内容识别的方法, 其特征在于, 所述查询 请求类型包括: 上传内容版权验证、 下载内容版权验证、 过滤管理和 软件验证。
11、 如权利要求 10所述内容识别的方法, 其特征在于, 当所述 查询请求类型为上传内容版权验证时,所述内容属性查询请求还包括 业务实体标识,所述获取的内容属性包括版权声明和所述业务实体标 识对应的业务实体的版权管理规则。
12、 如权利要求 10所述内容识别的方法, 其特征在于, 当所述 查询请求类型为软件验证时,所述内容属性查询请求还包括软件名称 和版本信息,
所述根据内容属性查询请求中携带的内容特征值, 在 CIM实体 的内容特征数据库中查找与所述内容特征值最接近的已存内容的特 征值具体为:
根据所述内容属性查询请求中携带的软件名称和版本信息,在所 述 CIM实体的内容特征数据库中查找与所述内容特征值最接近的已 存内容的特征值。
13、 如权利要求 10所述内容识别的方法, 其特征在于, 当所述 查询请求类型为下载内容版权验证时,所述内容属性查询请求还包括 用户标识。
14、 一种用于内容识别的内容注册方法, 其特征在于, 包括: 接收注册主体发送的内容注册请求;
根据所述内容注册请求携带的内容类型和注册目的参数选择特 征生成算法, 生成内容特征值;
存储所述生成的内容特征值和所述内容的内容属性,完成对所述 注册主体提交的内容的注册。
15、 如权利要求 14所述用于内容识别的内容注册方法, 其特征 在于, 所述对注册主体提交的内容进行注册还包括:
为所述注册的内容生成内容证书 ,所述内容证书包括注册内容的 特征值、 内容属性、 注册主体和验证所述内容属性的真实性的方法。
16、 如权利要求 15所述用于内容识别的内容注册方法, 其特征 在于, 所述验证内容属性的真实性的方法包括: 所述 CIM实体的标 识、 所述 CIM实体认证注册主体的方法和验证所述内容属性的真实 性的方法。
17、 如权利要求 14所述用于内容识别的内容注册方法, 其特征 在于, 还包括: 将所述内容的注册信息同步到除所述 CIM实体之外 的其他 CIM实体, 所述其他 CIM实体根据所述内容的注册信息为所 述内容生成内容证书,所述其他 CIM实体生成的内容证书包括原 CIM 实体的标识和所述原 CIM实体验证内容属性的真实性的方法。
18、 一种内容识别的系统, 其特征在于, 包括:
业务实体,用于根据内容类型和所述业务实体对内容的管理目的 选择特征提取算法提取内容特征值,根据所述内容特征值获取内容识 别管理 CIM实体上的已注册内容的内容属性;
内容识别管理 CIM实体, 用于接收所述业务实体发送的内容属 性查询请求,所述内容属性查询请求包括所述内容特征值和查询请求 类型,根据所述内容属性查询请求中携带的内容特征值,在所述 CIM 实体的内容特征数据库中查找与所述内容特征值最接近的已存内容 的特征值,并在查找到与所述内容特征值最接近的已存内容的特征值 之后 ,根据所述内容属性查询请求中携带的查询请求类型获取已注册 内容的内容属性, 并将所述获取的内容属性返回所述业务实体。
19、 如权利要求 18所述内容识别的系统, 其特征在于, 还包括: 注册主体, 用于向所述 CIM实体发送内容注册请求, 请求所述
CIM实体对所述注册主体提交的内容进行注册。
20、 一种业务实体, 其特征在于, 包括:
特征值提取模块 ,用于根据内容类型和所述业务实体对内容的管 理目的选择特征提取算法提取内容特征值;
属性获取模块,用于根据所述特征值提取模块提取的内容特征值 获取内容识别管理 CIM实体上的已注册内容的内容属性。
21、 如权利要求 20所述业务实体, 其特征在于, 所述业务实体 还包括:
内容管理模块,用于根据所述属性获取模块获取的内容属性对所 述业务实体的内容进行管理。
22、 如权利要求 20所述业务实体, 其特征在于, 所述属性获取 模块包括: 查找获取子模块, 用于查找内容证书, 根据所述查找到的内容证 书获取所述内容属性; 或者,
查询获取子模块,用于向所述 CIM实体发送内容属性查询请求, 接收所述 CIM实体返回的内容属性, 所述内容属性查询请求包括所 述内容特征值、 特征值生成算法和查询请求类型。
23、 一种内容识别管理 CIM实体, 其特征在于, 包括: 内容注册模块, 用于接收注册主体发送的内容注册请求, 对所述 注册主体提交的内容进行注册;
内容特征数据库, 用于保存已注册内容的内容特征值和内容属 性;
验证和查询处理模块,用于接收业务实体发送的内容属性查询请 求, 所述内容属性查询请求包括所述内容特征值和查询请求类型, 根 据所述内容属性查询请求中携带的内容特征值, 在所述 CIM实体的 内容特征数据库中查找与所述内容特征值最接近的已存内容的特征 值, 在查找到与所述内容特征值最接近的已存内容的特征值之后, 根 据所述内容属性查询请求中携带的查询请求类型获取已注册内容的 内容属性, 并将所述获取的内容属性返回所述业务实体。
24、 如权利要求 23所述 CIM实体, 其特征在于, 所述内容注册 模块包括:
特征值生成子模块,用于根据内容类型和注册目的参数选择特征 生成算法, 生成内容特征值, 并将所述生成的内容特征值和所述内容 的内容属性存储到所述内容特征数据库中。
25、 如权利要求 24所述 CIM实体, 其特征在于, 所述内容注册 模块还包括:
证书生成子模块, 用于为所述注册内容生成内容证书, 所述内容 证书包括注册内容的特征值、 内容属性、 注册的业务实体和验证所述 内容属性的真实性的方法。
26、 如权利要求 23所述 CIM实体, 其特征在于, 所述验证和查 询处理模块包括: 特征值查找子模块,用于根据所述内容属性查询请求中携带的内 容特征值,在所述内容特征数据库中查找与所述内容特征值最接近的 已存内容的特征值;
属性获取子模块,用于在所述特征值查找子模块查找到与所述内 容特征值最接近的已存内容的特征值之后 ,根据所述内容属性查询请 求中携带的查询请求类型获取内容属性 ,并将所述获取的内容属性返 回所述业务实体。
27、 如权利要求 23所述 CIM实体, 其特征在于, 还包括: 同步模块, 用于将所述内容的注册信息同步到除所述 CIM实体 之外的其他 CIM实体。
PCT/CN2008/073001 2008-04-07 2008-11-10 一种内容识别的方法、系统和装置 WO2009124440A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP08873839A EP2264634A4 (en) 2008-04-07 2008-11-10 METHOD, SYSTEM AND DEVICE FOR CONTENT IDENTIFICATION
US12/900,273 US20110029555A1 (en) 2008-04-07 2010-10-07 Method, system and apparatus for content identification

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200810089543.X 2008-04-07
CN200810089543XA CN101251881B (zh) 2008-04-07 2008-04-07 一种内容识别的方法、系统和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/900,273 Continuation US20110029555A1 (en) 2008-04-07 2010-10-07 Method, system and apparatus for content identification

Publications (1)

Publication Number Publication Date
WO2009124440A1 true WO2009124440A1 (zh) 2009-10-15

Family

ID=39955267

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/073001 WO2009124440A1 (zh) 2008-04-07 2008-11-10 一种内容识别的方法、系统和装置

Country Status (4)

Country Link
US (1) US20110029555A1 (zh)
EP (1) EP2264634A4 (zh)
CN (1) CN101251881B (zh)
WO (1) WO2009124440A1 (zh)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251881B (zh) * 2008-04-07 2010-04-14 华为技术有限公司 一种内容识别的方法、系统和装置
US9154942B2 (en) 2008-11-26 2015-10-06 Free Stream Media Corp. Zero configuration communication between a browser and a networked media device
US10567823B2 (en) 2008-11-26 2020-02-18 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US9519772B2 (en) 2008-11-26 2016-12-13 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10334324B2 (en) 2008-11-26 2019-06-25 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US8180891B1 (en) 2008-11-26 2012-05-15 Free Stream Media Corp. Discovery, access control, and communication with networked services from within a security sandbox
US10880340B2 (en) 2008-11-26 2020-12-29 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9386356B2 (en) 2008-11-26 2016-07-05 Free Stream Media Corp. Targeting with television audience data across multiple screens
US9026668B2 (en) 2012-05-26 2015-05-05 Free Stream Media Corp. Real-time and retargeted advertising on multiple screens of a user watching television
US9961388B2 (en) 2008-11-26 2018-05-01 David Harrison Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements
US10631068B2 (en) 2008-11-26 2020-04-21 Free Stream Media Corp. Content exposure attribution based on renderings of related content across multiple devices
US10977693B2 (en) 2008-11-26 2021-04-13 Free Stream Media Corp. Association of content identifier of audio-visual data with additional data through capture infrastructure
US10419541B2 (en) 2008-11-26 2019-09-17 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US9986279B2 (en) 2008-11-26 2018-05-29 Free Stream Media Corp. Discovery, access control, and communication with networked services
CN101788980A (zh) * 2009-01-23 2010-07-28 中兴通讯股份有限公司 一种实现内容注册、识别和检索的方法及系统
CN101989923B (zh) * 2009-07-31 2013-08-28 国际商业机器公司 将cim代理注册到管理代理的方法和系统以及管理系统
CN101997857B (zh) * 2009-08-27 2015-06-03 中兴通讯股份有限公司 基于内容id证书的注册与识别方法及内容离线识别系统
US10019741B2 (en) * 2010-08-09 2018-07-10 Western Digital Technologies, Inc. Methods and systems for a personal multimedia content archive
CN102480702A (zh) * 2010-11-24 2012-05-30 腾讯科技(深圳)有限公司 短信拦截方法和系统
US20130006951A1 (en) * 2011-05-30 2013-01-03 Lei Yu Video dna (vdna) method and system for multi-dimensional content matching
CN102761627B (zh) * 2012-06-27 2015-12-09 北京奇虎科技有限公司 基于终端访问统计的云网址推荐方法及系统及相关设备
US20140041054A1 (en) * 2012-08-01 2014-02-06 Microsoft Corporation Attestation of possession of media content items using fingerprints
US9063544B2 (en) * 2012-09-19 2015-06-23 The Boeing Company Aerial forest inventory system
CN103491393B (zh) * 2013-09-23 2016-11-23 华为技术有限公司 一种视频业务处理方法及设备
CN104639517B (zh) * 2013-11-15 2019-09-17 阿里巴巴集团控股有限公司 利用人体生物特征进行身份验证的方法和装置
KR101627398B1 (ko) * 2013-12-27 2016-06-13 삼성전자주식회사 내용기반의 검색엔진을 이용한 개인 콘텐츠 저작권 관리 시스템 및 방법
CN104897051B (zh) * 2014-03-03 2019-01-11 卡尔蔡司显微镜有限责任公司 用于对数码显微镜进行测量校准的校准板及其使用方法
US10171437B2 (en) * 2015-04-24 2019-01-01 Oracle International Corporation Techniques for security artifacts management
US10699020B2 (en) 2015-07-02 2020-06-30 Oracle International Corporation Monitoring and alert services and data encryption management
CN105185401B (zh) * 2015-08-28 2019-01-01 广州酷狗计算机科技有限公司 同步多媒体文件列表的方法及装置
CN106126574A (zh) * 2016-06-16 2016-11-16 深圳市矽伟智科技有限公司 图片的识别方法、系统及物联网摄像设备
CN108667881B (zh) * 2017-03-31 2020-08-18 中国科学院声学研究所 一种智能终端与云服务器的业务数据的同步方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7010808B1 (en) * 2000-08-25 2006-03-07 Microsoft Corporation Binding digital content to a portable storage device or the like in a digital rights management (DRM) system
CN1781095A (zh) * 2003-05-01 2006-05-31 三星电子株式会社 认证方法和设备
CN1830212A (zh) * 2003-07-26 2006-09-06 皇家飞利浦电子股份有限公司 广播媒体的内容识别
CN101251881A (zh) * 2008-04-07 2008-08-27 华为技术有限公司 一种内容识别的方法、系统和装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6564253B1 (en) * 1999-05-07 2003-05-13 Recording Industry Association Of America Content authorization system over networks including searching and reporting for unauthorized content locations
US7185201B2 (en) * 1999-05-19 2007-02-27 Digimarc Corporation Content identifiers triggering corresponding responses
US20010041989A1 (en) * 2000-05-10 2001-11-15 Vilcauskas Andrew J. System for detecting and preventing distribution of intellectual property protected media
JP2002297818A (ja) * 2001-03-29 2002-10-11 Ricoh Co Ltd コンテンツ情報管理装置、コンテンツ情報管理方法、該方法を実現するコンテンツ情報管理プログラム、および該コンテンツ情報管理プログラムを記録した記録媒体
EP1490767B1 (en) * 2001-04-05 2014-06-11 Audible Magic Corporation Copyright detection and protection system and method
US20030105739A1 (en) * 2001-10-12 2003-06-05 Hassane Essafi Method and a system for identifying and verifying the content of multimedia documents
KR100456618B1 (ko) * 2001-11-08 2004-11-10 한국전자통신연구원 인트라 도메인에서의 등록 정보 동기화 방법
US8332326B2 (en) * 2003-02-01 2012-12-11 Audible Magic Corporation Method and apparatus to identify a work received by a processing system
JP4691618B2 (ja) * 2003-03-05 2011-06-01 ディジマーク コーポレイション コンテンツ識別、個人ドメイン、著作権告知、メタデータ、および電子商取引
KR100507809B1 (ko) * 2003-03-19 2005-08-17 학교법인 한국정보통신학원 네트워크상에서의 겹선형쌍 디피-헬만 문제를 이용한 익명핑거프린팅 방법
US8918316B2 (en) * 2003-07-29 2014-12-23 Alcatel Lucent Content identification system
US20070242880A1 (en) * 2005-05-18 2007-10-18 Stebbings David W System and method for the identification of motional media of widely varying picture content
CN100426311C (zh) * 2006-02-17 2008-10-15 华为技术有限公司 一种对媒体内容的触发使用方进行限制的方法和系统
DE102006011294A1 (de) * 2006-03-10 2007-09-13 Siemens Ag Verfahren und Kommunikationssystem zum rechnergestützten Auffinden und Identifizieren von urheberrechtlich geschützten Inhalten
GB0622149D0 (en) * 2006-11-07 2006-12-20 Singlepoint Holdings Ltd System and method to validate and authenticate digital data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7010808B1 (en) * 2000-08-25 2006-03-07 Microsoft Corporation Binding digital content to a portable storage device or the like in a digital rights management (DRM) system
CN1781095A (zh) * 2003-05-01 2006-05-31 三星电子株式会社 认证方法和设备
CN1830212A (zh) * 2003-07-26 2006-09-06 皇家飞利浦电子股份有限公司 广播媒体的内容识别
CN101251881A (zh) * 2008-04-07 2008-08-27 华为技术有限公司 一种内容识别的方法、系统和装置

Also Published As

Publication number Publication date
CN101251881B (zh) 2010-04-14
US20110029555A1 (en) 2011-02-03
EP2264634A4 (en) 2011-04-20
CN101251881A (zh) 2008-08-27
EP2264634A1 (en) 2010-12-22

Similar Documents

Publication Publication Date Title
WO2009124440A1 (zh) 一种内容识别的方法、系统和装置
JP7222036B2 (ja) モデルトレーニングシステムおよび方法および記憶媒体
US11934497B2 (en) Content anti-piracy management system and method
CN111723355B (zh) 数据库中的信息管理
US9473568B2 (en) Detecting code injections through cryptographic methods
US20200084045A1 (en) Establishing provenance of digital assets using blockchain system
US20190139047A1 (en) Block chain based resource management
JP2015181010A (ja) インターネットサイトにアップロードされるマルチメディアにおけるユーザプライバシを保護するシステム及び方法
RU2500075C2 (ru) Создание и проверка достоверности документов, защищенных криптографически
US20060080546A1 (en) System and method for regulating access to objects in a content repository
US20030095660A1 (en) System and method for protecting digital works on a communication network
US20070168293A1 (en) Method and apparatus for authorizing rights issuers in a content distribution system
US20110138478A1 (en) Metadata Broker
US20200412554A1 (en) Id as service based on blockchain
WO2020062667A1 (zh) 数据资产管理方法、数据资产管理装置及计算机可读介质
CN116150234A (zh) 基于区块链的数据存证方法、装置、设备和介质
US20220092104A1 (en) System for automatic management and depositing of documents (images) hash in block-chain technology
WO2017096886A1 (zh) 内容推送的方法、装置以及系统
JP5161053B2 (ja) ユーザ認証方法、ユーザ認証システム、サービス提供装置、及び認証制御装置
EP4227820A1 (en) System for managing data
JP2005339171A (ja) P2pファイル共有方法及びシステム
US20240020420A1 (en) Tamper-evident storage and provisioning of media streams
JP2024503173A (ja) デジタルメディアを登録し、デジタルメディアの登録を検証する方法及びシステム
CN116938478A (zh) 一种权限确定方法、装置、设备及可读存储介质
KR20230160849A (ko) 블록체인-구현 데이터 애플리케이션에서 서명 검증을 위한 개선된 방법 및 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08873839

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2008873839

Country of ref document: EP