WO2009152709A1 - 内容识别方法和系统以及内容管理客户端和服务器 - Google Patents
内容识别方法和系统以及内容管理客户端和服务器 Download PDFInfo
- Publication number
- WO2009152709A1 WO2009152709A1 PCT/CN2009/071626 CN2009071626W WO2009152709A1 WO 2009152709 A1 WO2009152709 A1 WO 2009152709A1 CN 2009071626 W CN2009071626 W CN 2009071626W WO 2009152709 A1 WO2009152709 A1 WO 2009152709A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content
- identification
- information
- identified
- recognition
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 233
- 238000000605 extraction Methods 0.000 claims description 29
- 239000000284 extract Substances 0.000 claims description 22
- 239000013589 supplement Substances 0.000 claims description 11
- 230000000153 supplemental effect Effects 0.000 claims description 6
- 230000001502 supplementing effect Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 description 19
- 230000009286 beneficial effect Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 238000012544 monitoring process Methods 0.000 description 8
- 150000001768 cations Chemical class 0.000 description 2
- 230000006378 damage Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/41—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/16—Program or content traceability, e.g. by watermarking
Definitions
- the present invention relates to the field of information security technologies, and in particular, to a content identification method and system, and a content management client and server. Background technique
- the existing security content is based on the content-based ident if i cat ion (Secure Content Ident if i cat ion Mechani sm, hereinafter referred to as SCIDM). Called: based on eigenvalue identification).
- SCIDM-based content identification system manages a content management server (SCIDM Server) (or a Content Ident if Icating Manager (CIM)).
- SCIDM Server content management server
- CCM Content Ident if Icating Manager
- the protected content is registered in the content management center, and the content management center extracts the feature value for the protected content and saves it, and the content management center saves the related attributes of the protected content (such as: copyright attribution information, copyright protection rules, etc.) information.
- the SCIDM-based content recognition system also defines a content management client (SCIDM Cl ient ) (or a monitoring entity (Moni tor Ent tym).
- the content management client is used to monitor whether content that has been sent or sent to an entity such as a monitoring gateway, a user terminal, or a content sharing website is copyright infringing content.
- the content management client extracts the feature value from the received content, and sends the extracted feature value to the content management server, where the content management server locally searches for the feature of the protected content that matches the extracted feature value. value. If the content management server finds the feature value of the protected content that matches the extracted feature value, the related content attribute is returned to the content management client, and the content management client is These content attributes filter, block, or otherwise control the content.
- the inventor has found that in the existing feature value recognition method, the content management client extracts the feature value of the content and the content management server searches and matches the content according to the feature value of the content, and the content recognition process consumes a large amount of content. Computing resources. Due to the large amount of audio and video content in the network, the content management client and the content management server are more severely burdened and the recognition efficiency is low. Summary of the invention
- Embodiments of the present invention provide a content identification method and system, and a content management client and a server, which are beneficial to reducing the load of the content recognition system and improving the recognition efficiency.
- a first aspect of the embodiments of the present invention provides a content identification method, including:
- the server adopts the identification manner and identifies an attribute of the content to be identified according to the identification information.
- a content identification method provided by the first aspect of the present invention, by selecting a recognition manner of the content to be identified, extracting identification information required for content recognition by using the selected identification manner in the content to be identified, and providing the content management server to the content management server Transmitting a first content identification request including the identification manner and the identification information, instructing the content management server to adopt the identification manner included in the first content identification request, and identifying the attribute of the content to be identified according to the identification information included in the first content identification request, thereby
- the content recognition method can be flexibly selected according to the load condition of the content recognition system or the actual security requirement, and the load of the content recognition system is reduced, and the efficiency of content recognition is improved.
- a second aspect of the embodiments of the present invention provides another content identification method, including:
- the first content identification request sent by the content management client is received, and the identification method included in the first content identification request is adopted, and the first content identification request is received according to the first content identification request.
- the identification information included in the identifier identifies the attribute of the content to be identified, so that the content management client selects the identification mode and the identification information as the identification mode and the identification information used by the content recognition process to be recognized, which is beneficial to the content management client according to the content identification system.
- the content identification method is flexibly selected by the load situation or the actual security requirement, thereby reducing the load of the content recognition system and improving the efficiency of content recognition.
- a third aspect of the embodiments of the present invention provides a content management client, including:
- a selection module configured to select a recognition method of the content to be identified
- An extraction module configured to extract identification information of the content to be identified corresponding to the selected identification manner
- a sending module configured to send a first content identification request, where the first content identification request includes the selected identification manner and identification information, and is used to request the content management server to adopt the identification manner and identify the identifier according to the identification information.
- the attribute that identifies the identified content is not limited to a sending module, configured to send a first content identification request, where the first content identification request includes the selected identification manner and identification information, and is used to request the content management server to adopt the identification manner and identify the identifier according to the identification information.
- the attribute that identifies the identified content is configured to send a first content identification request, where the first content identification request includes the selected identification manner and identification information, and is used to request the content management server to adopt the identification manner and identify the identifier according to the identification information. The attribute that identifies the identified content.
- the selection module selects the identification manner of the content to be identified, and the extraction module extracts the identification information required for the content identification by using the selected identification manner in the content to be identified. And sending, by the sending module, a first content identification request including the identification manner and the identification information to the content management server, instructing the content management server to adopt the identification manner included in the first content identification request, and identifying the identification information included in the request according to the first content.
- the attribute of the content to be identified is identified, so that the content management client can flexibly select the content identification mode according to the load condition of the content recognition system or the actual security requirement, and is beneficial to reducing the load of the content recognition system and improving the efficiency of content recognition.
- a fourth aspect of the embodiments of the present invention provides a content management server, including:
- An obtaining module configured to receive, according to the received first content identification request from the content management client, Obtaining an identification manner included in the first content identification request and identification information corresponding to the identification manner;
- an identification module configured to identify the attribute of the content to be identified according to the identification information and the pre-stored content data information by using the identification manner.
- the acquiring module acquires the identification mode and the identification information included in the first content identification request sent by the content management client, and uses the recognition mode by the identification module.
- the identification information identifies the attribute of the content to be identified, so that the identification mode and the identification information selected by the content management client can be used as the identification mode and the identification information used by the identification module in the process of identifying the content to be identified, which is beneficial to the content management client according to the content.
- the content identification method is flexibly selected by identifying the load condition or actual security requirement of the system, thereby reducing the load of the content recognition system and improving the efficiency of content recognition.
- a fifth aspect of the embodiments of the present invention provides a content identification system, including:
- a content management client configured to select a recognition manner of the content to be identified; extracting identification information of the content to be identified corresponding to the selected identification manner; sending a first content identification request, where the first content identification request includes The selected identification method and identification information;
- a content management server configured to acquire, according to the received first content identification request from the content management client, an identification manner included in the first content identification request and identification information corresponding to the identification manner; And identifying an attribute of the content to be identified according to the identification information and the pre-stored content data information.
- a content management client selects a recognition manner of the content to be identified, and extracts identification information required for content recognition by using the selected recognition manner in the content to be identified, and Transmitting, to the content management server, a first content identification request including the identification manner and the identification information, the content management server adopting the identification manner included in the first content identification request, and identifying the content to be identified according to the identification information included in the first content identification request.
- FIG. 1 is a flow chart of a first embodiment of a content identification method according to the present invention.
- FIG. 2 is a flow chart of a second embodiment of a content identification method according to the present invention.
- FIG. 3 is a flow chart of a third embodiment of a content identification method according to the present invention.
- FIG. 4 is a flow chart of a fourth embodiment of a content identification method according to the present invention.
- FIG. 5 is a flowchart of a fifth embodiment of a content identification method according to the present invention.
- FIG. 6 is a flow chart of another embodiment of a content identification method according to the present invention.
- FIG. 7 is a schematic structural diagram of a first embodiment of a content management client according to the present invention.
- FIG. 8 is a schematic structural diagram of a second embodiment of a content management client according to the present invention.
- FIG. 9 is a schematic structural diagram of a first embodiment of a content management server according to the present invention.
- FIG. 10 is a schematic structural diagram of a second embodiment of a content management server according to the present invention.
- FIG. 11 is a schematic structural diagram of an embodiment of a content identification system according to the present invention. detailed description
- FIG. 1 is a flow chart of a first embodiment of a content identification method according to the present invention. As shown in Figure 1, this embodiment includes:
- Step 11 The content management client selects a recognition manner of the content to be identified.
- the content management client can monitor the content of the entity that passes through or is sent to the monitoring gateway, the user terminal, the content sharing website, etc., and passes through or is sent to the monitoring gateway, the user terminal, the content sharing website, etc. according to actual needs during the monitoring process.
- Content for content identification When the content management client initiates the content identification process of the content to be identified, the content identification mode may be selected according to the load condition of the content recognition system, the specific application scenario, or the preset security requirements.
- the content identification method selected by the content management client may include: a content identification recognition method, a tamper-based information recognition method, a content metadata identification method, a watermark recognition method, or a feature value recognition method.
- the content recognition efficiency of these five content recognition methods is gradually improved by simple Complex, but the content recognition robustness and security are gradually enhanced by weakness, that is: the simpler the identification method is less reliable, and the more complex the identification method is more reliable.
- Metadata is usually embedded in the file header and requires special tools to be read or written, so by tampering with the metadata Tampering with content is more difficult; digital watermarks are randomly embedded in certain bits of the content.
- the embedded location information is confidential and difficult for ordinary users to know, it is more difficult to tamper with the content by destroying the digital watermark; Since the feature extraction method directly extracts key features from the content to compare with the protected content feature values, only the destruction of the key features of the content can invalidate the way of identifying the content based on the content key value recognition method, but due to the content After the key features have been changed, the content itself may have been re-engineered or severely tampered with.
- the content has been re-edited relative to the protected content, ie, the content has been re-created, the content no longer infringes the original copyright of the protected content; if the content is severely damaged, the user perceives the content when the content is broken, Usually such content destruction has no meaning; therefore, the reliability and security of content recognition based on the feature value recognition method is the highest.
- the embodiment of the present invention is different from the prior art in that: not all content recognition needs to adopt a complex recognition method (such as: based on feature value recognition), but may identify the load condition of the system according to the content, and specific Application scenarios or pre-set security requirements, flexible selection of appropriate content identification methods for content identification.
- the content management client flexibly selects the content identification mode according to the load condition of the content recognition system, the specific application scenario or the preset security requirements; for example: if the content recognition system is under load
- a simpler content identification method such as: content-based identification, tamper-based information recognition, etc.
- the cornerstone of the project reducing system load and saving system resources; if the content identification system is high in load and involves safety, it requires preset security.
- more complex content recognition methods (such as eigenvalue recognition) can be selected to ensure the reliability and security of content recognition.
- Step 12 The content management client extracts the identification information of the content to be identified corresponding to the selected identification manner.
- the content management client When the content management client completes the identification mode of the content to be identified, the content management client extracts the identification information of the content to be identified corresponding to the selected identification mode, wherein the identification information extracted by the content management client from the content to be identified may be The content identifier, the content size information, the content source address information, the content target address information, the metadata, the digital watermark or the content feature value of the content to be identified, and the like are included.
- the content identifier may be a name of the content to be identified (Name) or a unique identifier assigned by the content identification server to the registered content.
- the identification information extracted by the content management client includes: a content identifier of the content to be identified and first auxiliary identification information, where the identification information selected by the content management client is based on the content identification identification method, and the first auxiliary identification information includes but is not limited to the content.
- the size information is: when the identification mode selected by the content management client is based on the tampering information identification mode, the identification information extracted by the content management client includes the content identifier of the content to be identified and the second auxiliary identification information, and the second auxiliary identification information may include Source address information, destination address information or other auxiliary identification information of the content, the content source address information is the network address or network identifier of the entity that sent the content, and the content target address information is the network address or network identifier of the target entity to which the content is sent.
- the identification information extracted by the content management client includes the digital watermark in the content to be identified;
- the recognition mode selected by the content management client is based on the feature value recognition mode ,
- Content management client includes content identification information extracted feature value of the content and the like to be recognized.
- various identification methods may also be selected, for example: selecting multiple recognition modes other than the most complex recognition mode, and separately extracting the identification required for each recognition mode. information.
- the content management server may adopt corresponding identification methods according to the execution order of the recognition mode complexity. When various identification methods have been adopted but still fail to successfully identify the content, the content management client may also select the most complicated recognition mode and initiate again. Content Identify the process.
- Step 1 3 The content management client sends a first content identification request to the content management server.
- the first content identification request includes the selected identification manner and the identification information corresponding to the selected identification manner, and is used to request the content management server to adopt the Identifying the manner and identifying the attribute of the content to be identified based on the identification information.
- the identification information required for the content recognition by using the selected identification manner is extracted from the content to be identified, and the first content identification request including the identification manner and the identification information is sent to the content management server. Instructing the content management server to adopt the identification manner included in the first content identification request, and identifying the attribute of the content to be identified according to the identification information included in the first content identification request, thereby implementing the load condition and the specific application scenario according to the content identification system. Or the actual security requirements can flexibly select the content identification method, and it is beneficial to reduce the load of the content recognition system and improve the efficiency of content recognition.
- a content database may be set on the content management server, and the content data information is stored in the content database.
- the content data information stored in the content database of the content management server may include: real attribute information of the content, tampering record, real watermark information, real metadata, protected content feature value or other content data information; real attribute information including content The real identity and the content real size information, etc., the content real identity may be the real name of the registered content or the unique identifier assigned by the content management server to the registered content; the tampering record includes the content tampering identifier, the content identification time information, the content source address, and The target address information, etc., the content tampering identifier may be a tampering name of the registered content or an identifier of the content management server being falsified by the registered content.
- the content management client may adopt the identification mode selected by the content management client, and identify the attribute of the corresponding content according to the identification information extracted by the content management client and the content data information pre-stored by the content database. . If the content management server successfully identifies the content and sends the attribute information of the content to be identified to the content management client, the content management client may perform necessary processing on the content according to the obtained content attribute information, for example, filtering the corresponding content. Shield or other control processing.
- 2 is a flow chart of a second embodiment of a content identification method according to the present invention.
- the attribute identification of the content to be identified is performed by a simple content identification method (based on the content identification recognition method). As shown in FIG. 2, this embodiment includes:
- Step 21 The content management client selects the identification manner of the content to be identified according to the load condition of the content identification system or the preset security requirement as the content identification identification method; and extracts the identification information corresponding to the content identification identification method, that is, the extraction is based on
- the content identification identification method identifies identification information required for the content, and the identification information may include a content identification (ID) of the content to be identified and first auxiliary identification information, and the first auxiliary identification information may include content size (S i ze ) information or the like.
- ID content identification
- S i ze content size
- Step 22 The content management client sends a first content identification request to the content management server, where the first content identification request includes a recognition mode selected by the content management client (based on the content identification identification mode) and the identification mode (based on the content identification identification)
- the corresponding identification information (content identification, content name, content size information, etc.) is used to instruct the content management server to adopt the identification method and identify the attribute of the content to be identified according to the identification information.
- Step 23 The content management server receives the first content identification request sent by the content management client, and queries, according to the content identifier (ID) in the identification information, whether the content real attribute information stored on the content management server has the same content identifier as the content identifier.
- the content is actually identified, if yes, step 24 is performed; if not, step 27 is performed; wherein the real attribute information of the content stored on the content management server may include the content real identifier, the content real name, the content real size information, and the like.
- Step 24 The content management server queries, according to the first auxiliary identification information (content size information and the like) in the identification information, whether the corresponding information stored on the content management server matches, for example, the content management server compares the content size information included in the identification information with If the content of the stored content matches the actual size information, if yes, go to step 25. If not, go to step 27.
- the first auxiliary identification information content size information and the like
- Step 25 The content management server content is successfully identified, and the corresponding content real attribute information is obtained.
- Step 26 The content management server sends an identification success message to the content management client, where the identification success message includes the identification result and the attribute information of the content to be identified;
- Step 27 The content management server sends a content identification failure message to the content management client. This process.
- the identification manner of the content to be identified is selected based on the content identification.
- the content management server identifies the attribute to be identified based on the content identification identification manner, significantly reduces the load of the content identification system, improves the efficiency of content recognition, and is applicable to a scenario where security identification requirements are not very strict.
- FIG. 3 is a flow chart of a third embodiment of a content identification method according to the present invention.
- the content management client selects multiple (eg, three) types of content identification methods, and the content management server sequentially adopts the identification mode selected by the content management client according to the preset execution order, and identifies the content to be identified. Attributes. As shown in FIG. 3, this embodiment includes:
- Step 31 The content management client selects a recognition manner of the content to be identified according to a load condition of the content recognition system or a preset security requirement, and the selected identification manner of the content to be identified is based on the content identification identification manner, the tamper-based information recognition manner, and the Content metadata identification method; and separately extracting identification information corresponding to each identification mode, that is, corresponding to the content identification identification method, the identification information corresponding to the content identification identification method extracted by the content management client includes the content to be identified a content identifier (ID) and first auxiliary identification information, the first auxiliary identification information may include content size (S ize ) information, etc.; corresponding to the tamper-based information recognition manner, the content management client extracts corresponding to the tamper-based information recognition manner
- the identification information includes a content identifier and second auxiliary identification information, and the second auxiliary identification information may include content source address and destination address (Source/Des t inat ion address) information, etc.; corresponding to the content management client
- Metadata the metadata here mainly refers to the content-based hash of the content, that is, the content is summarized by using the MD5 or SHA series algorithm.
- Step 32 The content management client sends a first content identification request to the content management server, where the first content identification request includes a recognition mode selected by the content management client (based on the content identification identification mode, the tamper-based information recognition mode, and the content-based element) Data identification method) and each identification The identification information corresponding to the mode.
- Step 33 The content management server receives the first content identification request, and the content management server sequentially adopts a corresponding identification manner to identify the attribute of the content to be identified according to the preset execution order from the simple identification mode to the complex identification mode, that is, the content is firstly determined based on Content identification method and content identification based on the identification information corresponding to the content identification identification method: querying whether the content real identifier information stored in the content management server is the same as the content identifier according to the content identifier (ID) in the identification information The content is actually identified, if yes, the content is successfully identified based on the content identification identification method, and step 36 is performed; if not, the content identification based on the content identification identification method fails, and step 34 is performed; wherein the real attribute of the content stored by the content management server
- the information may include a true identification of the content, a real name of the content, and true size information of the content.
- Step 34 The content management server uses the tamper-based information identification method to query the tampering record stored by the content management server according to the identification information (content identifier, content source address information, or content destination address information) corresponding to the tamper-based information identification method.
- identification information content identifier, content source address information, or content destination address information
- the content tampering identifier matching the content identifier; if the tampering record stored by the content management server has the content tampering identifier matching the content identifier, further searching whether the tampering record stored by the content management server exists in the identification information The content source address information or the content destination address information matching corresponding information, if the tamper record stored by the content management server has content source address information or content target address information matching the content source address information or the content target address information included in the identification information And obtaining the content tampering identifier to identify the real identity of the content stored by the corresponding content management server, and the content management server does not need to successfully identify the content to be identified by using the tamper-based information recognition manner at this time.
- the content is identified based on the content metadata identification method, and step 35 is performed; if the tamper record stored by the content management server does not have the content tampering identifier matching the content identifier, or the tampering record stored by the content management server is not included in the identification information.
- the content source address information or the content target address information matched by the content source address information or the content target address information included is based on the content metadata identification method, and is based on the identification information corresponding to the content metadata identification method (the hash of the content) Value) identifies the content to be identified (not shown in Figure 3).
- the tampering record may include a content tampering identifier, content identification time information, content source address, and destination address information, and the like.
- Step 35 The content management server obtains the real identifier of the content according to the content tampering identifier, and queries the real attribute information of the content stored by the content management server according to the real identifier of the content.
- Step 36 The content management server sends an identification success message to the content management client, where the identification success message includes the identification result and the attribute information of the content to be identified;
- Step 37 The content management server sends a content identification failure message to the content management client; and ends the process.
- the content management client may select multiple identification manners of the content to be identified according to the load condition of the content recognition system or the preset security requirement, and the content management server may follow a preset order or rule (eg, from a simple content identification manner).
- the corresponding identification manner is adopted, and the content identification is performed according to the identification information corresponding to the currently adopted content recognition manner; when the content management server uses the simple content identification to successfully identify the attribute of the content to be recognized
- the content management client no longer adopts the more complicated content identification method selected by the content management client (for example: although the content management client selects multiple recognition methods including the tamper-based information recognition method and the metadata-based recognition method, if the content management server If the content is successfully identified based on the tamper-based information recognition method, the content management server no longer needs to use the metadata identification method for content recognition at this time, thereby reducing the load of the content recognition system and improving the efficiency of content recognition.
- FIG. 4 is a flow chart of a fourth embodiment of a content identification method according to the present invention.
- the content management client may send the correct identification indication information to the content management client to provide the content management client to provide the content to be identified.
- Corresponding content feature value; or the content management client receives the recognition failure message sent by the content management server, and the content management client originally selects the recognition mode is not based on the feature value recognition mode, the content management client selects a new recognition mode .
- this embodiment includes:
- Step 41 The content management client identifies the load condition of the system according to the content or the preset security requirement.
- the method for identifying the content to be identified is based on the content metadata identification manner, and extracting identification information corresponding to the content metadata identification method, the identification information including metadata (MD5 Va lue ), content identifier of the content to be identified (ID ), the name of the content to be identified ( Name ), etc.
- Step 42 The content management client sends a first content identification request to the content management server, where the first content identification request includes a content metadata identification manner selected by the content management client and identification information corresponding to the content metadata identification manner. .
- Step 43 The content management server retrieves the content data stored on the content management server according to the metadata (MD5 Va lue ), and if the content data stored on the content management server has real metadata matching the metadata (MD5 Va lue ), the content If the management server content is successfully identified, step 49 is performed; if there is no real metadata matching the metadata (MD5 Va lue ) in the content data stored on the content management server, step 44 is performed.
- Step 44 The content management server sends an identification failure message to the content management client, and returns the identification result.
- the content management server may also carry the correct identification indication information, where the correct identification indication information is used to instruct the content management client to supplement the identification information of the content to be identified. It can be understood that the correct identification indication information sent by the content management server to the content management client can also be sent as a separate message.
- Step 45 When the content management client receives the identification failure message sent by the content management server, the content management client actively selects the recognition mode of the content to be identified as the feature value recognition mode, and extracts the corresponding feature value identification method.
- the identification information that is, the feature value of the content to be identified is extracted.
- Step 46 The content management client sends a second content identification request to the content management server, where the second content identification request includes a new identification mode (based on the feature value recognition mode) and supplementary extracted identification information (the feature value of the content to be identified) ).
- Step 47 The content management server receives the second content identification request, and uses the content feature value to retrieve, among the protected content data values stored on the content management server, whether there is a protected content feature value that matches the content feature value of the content to be identified. If yes, the content recognition is successful, and step 48 is performed; if not, the content recognition fails, and step 410 is performed.
- Step 48 The content management server updates the tampering record, where the information recorded in the tampering record includes: a real identifier (ID) of the content, various identifiable identifiers (IDs), a date on which the identification record occurs, and a source of the identification record occurrence Address and destination address, etc.
- ID real identifier
- IDs various identifiable identifiers
- Step 49 The content management server sends an identification success message to the content management client, where the identification success message includes the identification result and the attribute information of the content to be identified;
- Step 410 The content management server sends a content identification failure message to the content management client. The process ends.
- the content management client when the content management client requests the content management server to identify the content to be identified when selecting a relatively simple identification method (for example, based on the metadata identification manner), if the content management server fails to identify the content, the content management server sends the correct identification indication information. Instructing the content management client to select a more complex identification method (eg, based on the feature value identification method) and supplementing the provision of the required identification information, or the content management client may also take the initiative to receive the identification failure message sent by the content management server.
- a relatively simple identification method for example, based on the metadata identification manner
- FIG. 5 is a flowchart of a fifth embodiment of a content identification method according to the present invention.
- Content management of this embodiment The client selects multiple recognition modes other than the most complex recognition mode, and the content management server sequentially adopts corresponding recognition methods according to the execution order of the recognition mode complexity. When various recognition methods have been adopted but still fail to successfully identify the content, The content management client can select the most complex recognition method and initiate the content recognition process again. As shown in FIG. 5, this embodiment includes:
- Step 51 The content management client selects various other identification methods other than the most complicated (or highest level) recognition mode, and separately extracts identification information required by various identification methods, for example, the most complex (or highest level) recognition mode is Based on the eigenvalue identification method, other identification modes supported by the content management client include a content identification recognition method, a tamper-based information recognition method, a metadata-based recognition method, and a watermark-based recognition method; correspondingly, corresponding to the content identification recognition method
- the identification information extracted by the content management client includes: a content identifier and content size information of the content to be identified, and the like; and the identification information extracted by the content management client includes a content identifier of the content to be identified, etc., corresponding to the tamper-based information recognition manner; Corresponding to the digital watermark in the content to be identified when the watermark recognition method is used; corresponding to the feature value recognition mode, the identification information extracted by the content management client includes the content feature value of the content to be recognized, and the like.
- Step 52 The content management client sends a first content identification request message to the content management server, where the first content identification request message includes one or more identification modes selected by the content management client and an identifier corresponding to each identification mode.
- Step 53 The content management server sequentially adopts a corresponding identification manner to identify attributes of the content to be identified according to a preset execution order from a simple identification mode to a complex identification mode, that is, sequentially adopts a content identification identification method and identifies based on tampering information. Method, based on metadata identification method and based on watermark recognition. When the content management server fails to identify the content to be identified by various identification methods, step 55 is performed.
- the tamper-based information recognition method may be adopted; if the content management server fails to successfully identify the recognized content based on the tamper-recognition identification method, the element-based element may be adopted.
- Data identification method The content management server adopts a content identification identification method, a tamper-based information recognition method, and For a detailed description of the attributes of the content to be identified, refer to the description of the first to fourth embodiments of the content identification method of the present invention, and details are not described herein. If the content management server fails to successfully identify the recognized content based on the metadata identification method, the watermark recognition method may be adopted.
- the watermark recognition method requires the content management client to negotiate watermark related information with the content management server in advance, including watermark embedding and extraction algorithms, watermark embedding location information and the like.
- the watermark recognition method is applicable to a scenario where the content management client is located at the monitoring gateway, the website or the SP.
- the content management client extracts the digital watermark information contained in the content in step 51 and sends it to the content management server in step 52. If the content management client fails to extract the watermark information in step 51 because the watermark is corrupted or the like, the content management server cannot use the watermark recognition method to identify; if the content management client successfully extracts the watermark information and sends it to the content management server.
- the content management server searches, from the content database, whether the stored watermark matching the extracted watermark information is stored according to the watermark information, and if so, the content recognition is successful; if not, the content recognition fails. Only the case where the content management server adopts the content identification recognition method, the tamper-based information recognition method, the metadata-based recognition method, and the watermark-based identification method fail to successfully identify the to-be-identified content is illustrated in FIG.
- Step 54 The content management server sends an identification failure message to the content management client, and returns a recognition result, and step 55 is performed.
- Step 55 When receiving the identification failure message sent by the content management server, the content management client selects the recognition mode of the content to be identified as the feature value recognition mode, and extracts the identification information corresponding to the feature value recognition mode, that is, extracts The feature value of the content to be identified.
- Step 56 The content management client sends a second content identification request to the content management server, where the second content identification request includes a new identification mode (based on the feature value identification mode) and the supplementally extracted identification information (the feature value of the content to be identified) ).
- Step 57 The content management server receives the second content identification request, and uses the content feature value to retrieve whether the protected content data value stored in the content management server matches the protected content feature value that matches the content feature value of the content to be identified. If yes, the content recognition is successful, go to step 58; If not, the content recognition fails, and step 510 is performed.
- Step 58 The content management server updates the tampering record, wherein the information recorded in the tampering record comprises: a real identifier (ID) of the content, various identifiable identifiers (IDs), a date on which the identification record occurs, and a source of the identification record occurrence Address and destination address, etc.
- ID real identifier
- IDs various identifiable identifiers
- IDs a date on which the identification record occurs
- Step 59 The content management server sends an identification success message to the content management client, where the identification success message includes the identification result and the attribute information of the content to be identified;
- Step 510 The content management server sends a content identification failure message to the content management client. The process ends.
- the content management client selects a plurality of relatively simple identification methods (eg, based on the content identification identification method, the tamper-based information recognition method, the metadata identification method, and the watermark recognition method) to request the content management server to identify the to-be-identified content.
- a more complicated recognition method eg, based on the feature value recognition method
- the simpler identification method (for example, based on the tampering information recognition method) can be preferentially selected, thereby reducing the load of the content recognition system and improving the efficiency of content recognition.
- This embodiment shows a case where the content management client transmits a secondary content identification request message to the content management server.
- the content management client may also initiate three or even three times of content identification requests, for example: the content management client may re-initiate each time the content management server fails the content.
- the content management client has selected the most complex recognition method that the content recognition system can support. However, when the content management server still fails to successfully identify the content, the content management client stops.
- the content identification request is sent to the content management server.
- FIG. 6 is a flow chart of another embodiment of a content identification method according to the present invention. As shown in FIG. 6, this embodiment includes:
- Step 61 The content management server acquires, according to the received first content identification request from the content management client, an identification manner included in the first content identification request and identification information corresponding to the identification manner.
- Step 62 The content management server adopts the identification manner included in the first content identification request, and identifies the attribute of the content to be identified according to the identification information included in the first content identification request and the content data information pre-stored on the content management server.
- the content management server receives the first content identification request sent by the content management client, adopts the identification manner included in the first content identification request, and identifies the attribute of the content to be identified according to the identification information included in the first content identification request.
- the content management client selects the identification mode and the identification information as the identification mode and the identification information used by the content identification process to be identified, which is beneficial to the content management client to flexibly select the content identification mode according to the load condition of the content recognition system or the actual security requirement. Therefore, it is beneficial to reduce the load of the content recognition system and improve the efficiency of content recognition.
- the embodiment relates to the identification manner included in the first content identification request by the content management client and the identification information required by each identification method, and a detailed description of the content identification by the content management server according to the specific identification manner and the corresponding identification information. Referring to the first to fifth embodiments of the content identification method of the present invention, and the description of FIGS. 1-5, details are not described herein.
- FIG. ⁇ is a schematic structural diagram of a first embodiment of a content management client according to the present invention. As shown in FIG. 7, the embodiment includes a selection module 71, an extraction module 72, and a transmission module 73.
- the selection module 71 is configured to select a recognition method of the content to be identified.
- the extraction module 72 is configured to extract identification information of the content to be identified corresponding to the identification mode selected by the selection module 71.
- the sending module 73 is configured to send a first content identification request, where the first content identification request includes the identification mode selected by the selection module 71 and the identification information extracted by the extraction module 72, for requesting the content management
- the server uses the identification method and identifies the attribute of the content to be identified based on the identification information.
- the selection module selects the identification manner of the content to be identified, and the extraction module extracts the identification information required for the content identification by using the selected identification manner in the content to be identified, and sends the identification information and the identification to the content management server through the sending module.
- the first content identification request of the information indicates that the content management server adopts the identification manner included in the first content identification request, and identifies the attribute of the content to be identified according to the identification information included in the first content identification request, thereby implementing the content management client.
- the content identification method is flexibly selected according to the load condition of the content recognition system or the actual security requirement, and the load of the content recognition system is reduced, and the efficiency of content recognition is improved.
- FIG. 8 is a schematic structural diagram of a second embodiment of a content management client according to the present invention.
- the identification manner includes a content identification identification method, a tamper-based information recognition method, a content metadata identification method, a watermark recognition method, Based on the feature value identification method or other identification methods;
- the identification information includes content identification, content size information, content source address information, content target address information, metadata, digital watermark, content feature value or content using corresponding identification methods of the content to be identified Identify other identifying information that is needed.
- the selecting module 71 is specifically configured to select the identification manner of the content to be identified according to the load condition of the content recognition system or the preset security requirement, based on the content identification identification mode, the tamper-based information recognition mode, the content metadata identification mode, and the watermark recognition mode. Or based on eigenvalue identification.
- the extraction module 72 includes at least one of the following units: a content information extraction unit 721, a tampering information extraction unit 722, a metadata extraction unit 723, a watermark extraction unit 724, and a feature value extraction unit 725.
- the content information extracting unit 721 is configured to extract the content identifier of the content to be identified and the first auxiliary identification information when the identification mode selected by the selection module 71 is based on the content identification identification mode, where the first auxiliary identification information includes content size information.
- the tampering information extracting unit 722 is configured to: when the identification mode selected by the selecting module 71 is based on the content metadata identification manner, extract the content identifier of the content to be identified and the second auxiliary identification information, where the second auxiliary identification information includes the content source address information. Or content destination address information.
- the metadata extracting unit 723 is configured to extract metadata of the content to be identified when the identification mode selected by the selecting module 71 is based on the content metadata identification manner.
- the watermark extraction unit 724 is configured to extract the digital watermark in the content to be identified when the recognition mode selected by the selection module 71 is based on the watermark recognition mode.
- the feature value extraction unit 725 is configured to extract the content feature value of the content to be identified when the recognition mode selected by the selection module 71 is based on the feature value recognition mode.
- the selection module can flexibly select the identification mode of the content to be identified according to the load condition of the content recognition system or the preset security requirement, and the extraction module extracts the identification information required by the identification mode according to the identification mode selected by the selection module, thereby facilitating the identification. Reduce the load on the content recognition system and improve the efficiency of content recognition.
- the selection module 71 can also be used to select a new identification mode when receiving the correct identification indication information sent by the content management server; or receive the content management.
- the identification failure information sent by the server, and the recognition method included in the first content identification request is not based on the feature value recognition mode, the new recognition mode is selected.
- the extraction module 72 is further configured to supplement the identification information corresponding to the new identification mode; and the sending module 73 is further configured to send the second content identification request, where the second content identification request includes the new identification mode selected by the selection module 71.
- the sum extraction module 72 supplements the extracted identification information.
- the sending module 73 may be further configured to encapsulate, in the first content identification request message, multiple identification modes selected by the selection module and each identification extracted by the extraction module. The corresponding identification information is sent to the content management server.
- the content management client embodiment of the present invention can be implemented as a stand-alone device or as a function module integrated on an entity such as a monitoring gateway, a user terminal or a content sharing website.
- an entity such as a monitoring gateway, a user terminal or a content sharing website.
- FIG. 9 is a schematic structural diagram of a first embodiment of a content management server according to the present invention. As shown in FIG. 9, the embodiment includes an acquisition module 91 and an identification module 92.
- the obtaining module 91 is configured to obtain, according to the received first content identification request from the content management client, an identification manner included in the first content identification request and identification information corresponding to the identification manner.
- the identification module 92 is configured to recognize the attribute of the content to be identified based on the identification information and the previously stored content data information.
- the acquiring module acquires the identification manner and the identification information included in the first content identification request sent by the content management client, and identifies the attribute of the content to be identified by using the acquired identification manner and the identification information, so that the content management client
- the identification mode and the identification information selected by the end can be used as the identification mode and the identification information used in the identification process of the identification module to identify the content, which is beneficial to the content management client to flexibly select the content identification according to the load condition of the content recognition system or the actual security requirement.
- the method is beneficial to reduce the load of the content recognition system and improve the efficiency of content recognition.
- FIG. 10 is a schematic structural diagram of a second embodiment of a content management server according to the present invention.
- the difference between this embodiment and the first embodiment of the content management server of the present invention is that the embodiment further includes a content database 93.
- the identification module 92 includes at least one of the following units: a content information identifying unit 921, a tampering information identifying unit 922, and metadata. Identification unit 923, watermark identification unit 924, and feature value identification unit 925.
- the content database 93 is for storing content data information;
- the content data information may include real attribute information of the content, tampering records, real watermark information, real metadata, protected content feature values or other information of the content;
- the real attribute information may include the content
- the tampering record may include information such as a content tampering identifier, content identification time information, a content source address, and a destination address information.
- the content information identification unit 921 is configured to: when the acquired identification manner is based on the content identification identification manner, and the identification information acquired includes the content identifier of the content to be identified and the first auxiliary identification information, according to the content identifier Querying, in the content real attribute information stored in the content database, whether there is a content real identifier that matches the content identifier; if yes, comparing the stored content of the content to the corresponding first auxiliary identification information and the identification information Whether the corresponding information included is consistent; if it matches, the content identification success message is sent; the first auxiliary identification information packet Includes content size information.
- the tampering information identifying unit 922 is configured to query the stored tampering record according to the content identifier when the acquired identification information is based on the tampering information identification manner, and the acquired identification information includes the content identifier of the content to be identified and the second auxiliary identification information.
- the second auxiliary identification information matches the corresponding information in the tampering record Obtaining the content real identity of the content tampering identifier; querying the real attribute information of the content stored by the content management server according to the content real identifier; the second auxiliary identification information includes content source address information or content target address information .
- the metadata identification unit 923 is configured to: when the identification mode acquired by the obtaining module is based on the content metadata identification manner, and the acquired identification information includes the metadata of the content to be identified, query the content database storage according to the metadata. In the real metadata, whether there is real metadata matching the metadata; if so, the content identification success message is sent.
- the watermark identification unit 924 is configured to: when the identification method acquired by the acquiring module is based on the watermark identification mode, and the identification information acquired includes the digital watermark in the content to be identified, query the reality stored in the content database according to the digital watermark In the watermark information, whether there is real metadata matching the metadata; if yes, a content recognition success message is sent.
- the feature value identification unit 925 is configured to: when the recognition mode acquired by the acquisition module is based on the feature value recognition mode, and the acquired identification information includes the content feature value of the content to be identified, query the content database according to the content feature value. Among the stored protected content feature values, is there a protected content feature value that matches the content feature value of the content to be identified; if so, a content recognition success message is sent.
- the obtaining module of the embodiment may obtain the identification mode and the identification information selected by the content management client according to the first content identification request from the content management client, and the identification module stores the content according to the specific identification mode and the identification information.
- the content data information is used to identify the content, so that the content management client can identify the load condition of the network or the preset security needs according to the network.
- the flexible selection of the identification method of the content to be identified is beneficial to reducing the load of the content recognition system and improving the efficiency of content recognition.
- the identification module may further include an identification indication information transmitting unit.
- the identification indication information sending unit is configured to send the correct information to the content management client when the selected identification manner is based on the content identification identification method, the tamper-based information recognition method, the content metadata identification method, or the watermark recognition method, and the content recognition fails.
- the identification indication information is used to indicate that the content management client supplements the content feature value corresponding to the content to be identified.
- the identification module 92 is further configured to receive the content management, if the content management client receives the content identification failure message sent by the content management server, and initiates a process of selecting a new identification mode and supplementing the identification information processing process required for the extraction.
- a second content identification request message sent by the client the second content identification request message includes a new identification manner and supplemental provided identification information; using the new identification method and identifying the location according to the supplemental provided identification information The attribute that identifies the identified content.
- the content management server may further include an update module based on the technical solution of the embodiment.
- the update module is configured to: when the recognition mode acquired by the acquisition module is based on a content metadata identification manner, a watermark recognition manner, or a feature value recognition manner, and the content recognition of the recognition module is successful, according to the content of the content to be identified
- the real identification, the content tampering identifier, the content source address and the destination address information or the content identification time information, and the tampering record corresponding to the content to be identified is updated or stored.
- the identification module may be further configured to: when the plurality of identification modes are included in the received first content identification request message, according to preset The order of identification is performed, and the attributes of the content to be identified are identified by using corresponding recognition methods.
- FIG. 11 is a schematic structural diagram of an embodiment of a content identification system according to the present invention. As shown in Figure 11, this is The example includes a content management client 111 and a content management server 112.
- the content management client 111 is configured to select a recognition manner of the content to be identified; extract the identification information of the to-be-identified content corresponding to the selected identification manner; and send a first content identification request, where the first content identification request includes The identification method and identification information are selected.
- the content management server 112 is configured to acquire, according to the received first content identification request from the content management client, an identification manner included in the first content identification request and identification information corresponding to the identification manner; And identifying an attribute of the content to be identified according to the identification information and the pre-stored content data information.
- the content management client selects the identification manner of the content to be identified, extracts the identification information required for the content identification by using the selected identification manner, and sends the identification information and the identification information to the content management server.
- a content identification request the content management server adopts the identification manner included in the first content identification request, and identifies the attribute of the content to be identified according to the identification information included in the first content identification request, so that the content management client can identify according to the content.
- the system's load situation or actual security requirements flexibly select the content identification method, which is beneficial to reduce the load of the content recognition system and improve the efficiency of content recognition.
- the refinement function module of the content management server can be implemented by the content management server of the present invention. Examples and the description of Figures 9-10; will not be described again.
- modules in the apparatus in the embodiments may be distributed in the apparatus of the embodiment according to the embodiment, or may be correspondingly changed in one or more apparatuses different from the embodiment.
- the modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
- the foregoing program may be stored in a computer readable storage medium, and when executed, the program includes the steps of the foregoing method embodiment; and the foregoing storage medium includes: R0M, RAM , a variety of media that can store program code, such as a disk or an optical disk.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Technology Law (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Storage Device Security (AREA)
- Information Transfer Between Computers (AREA)
Description
内容识别方法和系统以及内容管理客户端和服务器 技术领域
本发明涉及信息安全技术领域, 特别是涉及一种内容识别方法和系统以 及内容管理客户端和服务器。 背景技术
侵犯版权的内容在网络上传播的问题日益引起人们的普遍关注。 对侵犯 版权的内容的正确识别是对在网络上传播的这些内容进行过滤、 屏蔽或其他 控制处理的基础。
现有的安全内 容识另' J机制 (Secure Content Ident if i cat ion Mechani sm, 以下简称 SCIDM)是基于特征 (f ingerpr int)提取的内容识别机制 (content-based ident if i cat ion) (以下称为: 基于特征值识别方式) 。 基 于 SCIDM的内容识别系统管理一个内容管理服务器(SCIDM Server ) (或称 之为内容管理中心(Content Ident if icat ion Manager , 简称 CIM) ) 。 受保 护的内容在内容管理中心注册, 由内容管理中心为受保护的内容提取特征值 并保存, 同时内容管理中心保存该受保护的内容的相关属性(如: 版权归属 信息、 版权保护规则等)信息。 此外, 基于 SCIDM的内容识别系统还定义了 内容管理客户端 ( SCIDM Cl ient ) (或称之为监控实体(Moni tor Ent i ty, 简 称 ME) )。 内容管理客户端用于监控经过或发送到监控网关、 用户终端、 内 容分享网站等实体的内容是否属于侵犯版权的内容。 在监控过程中, 内容管 理客户端对接收到的内容提取特征值, 并将提取的特征值发给内容管理服务 器,由内容管理服务器在本地查找与提取的特征值相匹配的受保护内容的特 征值。 如果内容管理服务器查找到与提取的特征值相匹配的受保护内容的特 征值, 则将相关的内容属性返回给内容管理客户端, 由内容管理客户端根据
这些内容属性对相应内容进行过滤、 屏蔽或其他控制处理。
发明人在实现本发明过程中发现, 现有的基于特征值识别方式中, 内容 管理客户端提取内容的特征值以及内容管理服务器根据内容的特征值进行查 找和匹配等内容识别处理都需消耗大量的计算资源。 由于网络中音视频内容 数量巨大, 因此, 更加严重的造成内容管理客户端和内容管理服务器的负荷 巨大, 并且识别效率低。 发明内容
本发明实施例提供一种内容识别方法和系统以及内容管理客户端和服务 器, 有利于降低内容识别系统负荷, 提高识别效率。
本发明实施例第一方面提供了一种内容识别方法, 包括:
选取待识别内容的识别方式;
提取与选取的所述识别方式相应的所述待识别内容的识别信息; 发送第一内容识别请求, 所述第一内容识别请求中包括选取的所述识别 方式和识别信息, 用于请求内容管理服务器采用所述识别方式并根据所述识 别信息识别所述待识别内容的属性。
本发明实施例第一方面提供的一种内容识别方法中, 通过选取待识别内 容的识别方式, 在待识别内容中提取采用选取的识别方式进行内容识别所需 的识别信息, 并向内容管理服务器发送包括识别方式和识别信息的第一内容 识别请求, 指示内容管理服务器采用第一内容识别请求中包括的识别方式, 并根据第一内容识别请求中包括的识别信息识别待识别内容的属性, 从而实 现可根据内容识别系统的负荷情况或实际安全需求灵活选取内容识别方式, 并有利于降低内容识别系统的负荷, 提高内容识别的效率。
本发明实施例第二方面提供了另一种内容识别方法, 包括:
根据接收的来自内容管理客户端的第一内容识别请求, 获取所述第一内 容识别请求中包括的识别方式和与所述识别方式相应的识别信息;
采用所述识别方式, 并根据所述识别信息以及预先存储的内容数据信息 识别待识别内容的属性。
本发明实施例第二方面提供的另一种内容识别方法中, 通过接收内容管 理客户端发送的第一内容识别请求, 采用第一内容识别请求中包括的识别方 式, 并根据第一内容识别请求中包括的识别信息识别待识别内容的属性, 使 得内容管理客户端选取识别方式和识别信息可分别作为待识别内容识别过程 采用的识别方式和识别信息, 有利于内容管理客户端根据内容识别系统的负 荷情况或实际安全需求灵活选取内容识别方式, 从而有利于降低内容识别系 统的负荷, 提高内容识别的效率。
本发明实施例第三方面提供了一种内容管理客户端, 包括:
选取模块, 用于选取待识别内容的识别方式;
提取模块, 用于提取与选取的所述识别方式相应的所述待识别内容的识 别信息;
发送模块, 用于发送第一内容识别请求; 所述第一内容识别请求中包括 选取的所述识别方式和识别信息, 用于请求内容管理服务器采用所述识别方 式并根据所述识别信息识别所述待识别内容的属性。
本发明实施例第三方面提供的一种内容管理客户端中, 通过选取模块选 取待识别内容的识别方式, 提取模块在待识别内容中提取采用选取的识别方 式进行内容识别所需的识别信息, 并通过发送模块向内容管理服务器发送包 括识别方式和识别信息的第一内容识别请求, 指示内容管理服务器采用第一 内容识别请求中包括的识别方式, 并根据第一内容识别请求中包括的识别信 息识别待识别内容的属性, 从而实现内容管理客户端可根据内容识别系统的 负荷情况或实际安全需求灵活选取内容识别方式, 并有利于降低内容识别系 统的负荷, 提高内容识别的效率。
本发明实施例第四方面提供了一种内容管理服务器, 包括:
获取模块, 用于根据接收的来自内容管理客户端的第一内容识别请求,
获取所述第一内容识别请求中包括的识别方式和与所述识别方式相应的识别 信息;
识别模块, 用于采用所述识别方式并根据所述识别信息以及预先存储的 内容数据信息识别待识别内容的属性。
本发明实施例第四方面提供的一种内容管理服务器中, 通过获取模块获 取内容管理客户端发送的第一内容识别请求中包括的识别方式和识别信息, 并通过识别模块采用获取的识别方式和识别信息识别待识别内容的属性, 使 得内容管理客户端选取的识别方式和识别信息可分别作为识别模块进行待识 别内容的识别过程中采用的识别方式和识别信息, 有利于内容管理客户端根 据内容识别系统的负荷情况或实际安全需求灵活选取内容识别方式, 从而有 利于降低内容识别系统的负荷, 提高内容识别的效率。
本发明实施例第五方面提供了一种内容识别系统, 包括:
内容管理客户端, 用于选取待识别内容的识别方式; 提取与选取的所述 识别方式相应的所述待识别内容的识别信息; 发送第一内容识别请求, 所述 第一内容识别请求中包括选取的所述识别方式和识别信息;
内容管理服务器, 用于根据接收的来自内容管理客户端的第一内容识别 请求, 获取所述第一内容识别请求中包括的识别方式和与所述识别方式相应 的识别信息; 采用所述识别方式, 并根据所述识别信息以及预先存储的内容 数据信息识别待识别内容的属性。
本发明实施例第五方面提供的一种内容识别系统中,通过内容管理客户端 选取待识别内容的识别方式,在待识别内容中提取采用选取的识别方式进行内 容识别所需的识别信息,并向内容管理服务器发送包括识别方式和识别信息的 第一内容识别请求,由内容管理服务器采用第一内容识别请求中包括的识别方 式, 并根据第一内容识别请求中包括的识别信息识别待识别内容的属性,从而 实现内容管理客户端可根据内容识别系统的负荷情况或实际安全需求灵活选 取内容识别方式, 并有利于降低内容识别系统的负荷, 提高内容识别的效率。
附图说明
图 1为本发明一种内容识别方法第一实施例流程图;
图 2为本发明一种内容识别方法第二实施例流程图;
图 3为本发明一种内容识别方法第三实施例流程图;
图 4为本发明一种内容识别方法第四实施例流程图;
图 5为本发明一种内容识别方法第五实施例流程图;
图 6为本发明另一种内容识别方法实施例流程图;
图 7为本发明内容管理客户端第一实施例的结构示意图;
图 8为本发明内容管理客户端第二实施例的结构示意图;
图 9为本发明内容管理服务器第一实施例的结构示意图;
图 10为本发明内容管理服务器第二实施例的结构示意图;
图 11为本发明内容识别系统实施例的结构示意图。 具体实施方式
下面通过附图和实施例, 对本发明的技术方案做进一步的详细描述。 图 1为本发明一种内容识别方法第一实施例流程图。 如图 1所示, 本实 施例包括:
步骤 11、 内容管理客户端选取待识别内容的识别方式。
内容管理客户端可对经过或发送到监控网关、 用户终端、 内容分享网站 等实体的内容进行监控, 并在监控过程中根据实际需要对经过或发送到监控 网关、 用户终端、 内容分享网站等实体的内容进行内容识别。 内容管理客户 端发起待识别内容的内容识别过程时, 可根据内容识别系统的负荷情况、 具 体应用场景或预先设置的安全需求, 选取内容识别方式。
内容管理客户端选取的内容识别方式可包括: 基于内容标识识别方式、 基于篡改信息识别方式、 基于内容元数据识别方式、 基于水印识别方式或基 于特征值识别方式等。 这五种内容识别方式的内容识别效率依次由简单逐渐
复杂, 但内容识别鲁棒性和安全性上由弱逐渐增强, 即: 越简单的识别方式 可靠性较低, 越复杂的识别方式可靠性较高。 例如: 为了逃避内容管理客户 端的内容识别, 内容的标识和名称最容易被篡改;元数据 ( metadata )通常被 嵌入文件头中,需要专门的工具才能被读出或写入, 所以通过篡改元数据篡 改内容的难度较大;数字水印是被随机嵌入内容的某些位(b i t ) 中, 由于嵌 入位置信息是保密的并且普通用户难以获知, 因而通过破坏数字水印篡改内 容的难度也较大; 而由于特征提取方法是直接从内容中提取关键特征来与受 保护的内容特征值进行对比, 因此只有对内容的关键特征进行破坏才能使得 基于内容关键值识别方式识别内容的方式失效, 但由于内容的关键特征被改 变后,内容本身可能已被重新改编或被严重篡改。如果内容相对于受保护内容 已被重新改编, 即内容已被再创作, 则该内容不再侵犯受保护内容的原版权; 如果内容被严重破坏, 则破坏后用户观看内容时感知 4艮差, 通常这种内容破 坏没有意义;所以基于特征值识别方式进行内容识别的可靠性和安全性最高。
发明人在实现本发明过程中发现, 采用简单的内容识别方式(如: 基于 内容标识识别方式等)正确识别内容属性的概率仍然较大 , 在实际应用中, 多数用户不会改变内容的名称和内容标识, 改变元数据(metadata ) 的概率 较少, 而破坏数字水印的概率更少。 因此, 本发明实施例区别于现有技术的 是: 并不是所有的内容识别都需要采用基于复杂的识别方式(如: 基于特征 值识别方式) , 而是可根据内容识别系统的负荷情况、 具体应用场景或预先 设置的安全需求, 灵活选取合适的内容识别方式来进行内容识别。 为了在实 际应用中获得较高的内容识别效率, 内容管理客户端根据内容识别系统的负 荷情况、 具体应用场景或预先设置的安全需求, 灵活选取内容识别方式; 例 如: 如果内容识别系统负荷较低、 涉及安全性要求较低或预设的安全需求等 级较低的应用场景, 可选取较简单的内容识别方式(如: 基于内容识别方式、 基于篡改信息识别方式等) , 从而在保证内容安全性的基石出上, 降低系统负 荷, 节约系统资源; 如果内容识别系统负荷较高、 涉及安全性要求预设的安
全需求等级较高的应用场景, 可选取较复杂的内容识别方式(如: 基于特征 值识别方式等) , 从而有效保证内容识别的可靠性和安全性。
步骤 12、 内容管理客户端提取与选取的识别方式相应的待识别内容的识 别信息。
在内容管理客户端完成选取待识别内容的识别方式时, 内容管理客户端 提取与选取的识别方式相应的待识别内容的识别信息, 其中, 内容管理客户 端从待识别内容中提取的识别信息可包括待识别内容的内容标识、 内容大小 信息、 内容源地址信息、 内容目标地址信息、 元数据、 数字水印或内容特征 值等。 具体的, 内容标识可为待识别内容的名称( Name )或内容识别服务器 为已注册内容分配的唯一标识。 在内容管理客户端选取的识别方式为基于内 容标识识别方式时, 内容管理客户端提取的识别信息包括: 待识别内容的内 容标识和第一辅助识别信息,第一辅助识别信息包括但不限于内容大小信息; 在内容管理客户端选取的识别方式为基于篡改信息识别方式时, 内容管理客 户端提取的识别信息包括待识别内容的内容标识和第二辅助识别信息, 该第 二辅助识别信息可包括内容的源地址信息、 目标地址信息或其他辅助识别信 息, 内容源地址信息即为发出该内容的实体的网络地址或网络标识, 内容目 标地址信息即为内容发送的目标实体的网络地址或网络标识; 在内容管理客 户端选取的识别方式为基于水印识别方式时, 内容管理客户端提取的识别信 息包括待识别内容中的数字水印; 在内容管理客户端选取的识别方式为基于 特征值识别方式时, 内容管理客户端提取的识别信息包括待识别内容的内容 特征值等。
此外, 内容管理客户端在向内容管理服务器发送内容识别请求时, 还可 选取多种识别方式, 例如: 选取最复杂识别方式外的多种识别方式, 并且分 别提取每种识别方式所需的识别信息。 内容管理服务器可根据识别方式复杂 度的执行次序依次采取相应的识别方式, 当各种识别方式均已采用但仍未能 成功识别内容时, 内容管理客户端还可选取最复杂识别方式, 再次发起内容
识别流程。
步骤 1 3、 内容管理客户端向内容管理服务器发送第一内容识别请求; 该 第一内容识别请求中包括选取的识别方式和与选取的识别方式相应的识别信 息, 用于请求内容管理服务器采用该识别方式并根据该识别信息识别待识别 内容的属性。
本实施例通过选取待识别内容的识别方式, 在待识别内容中提取采用选 取的识别方式进行内容识别所需的识别信息, 并向内容管理服务器发送包括 识别方式和识别信息的第一内容识别请求, 指示内容管理服务器采用第一内 容识别请求中包括的识别方式, 并根据第一内容识别请求中包括的识别信息 识别待识别内容的属性, 从而实现可根据内容识别系统的负荷情况、 具体应 用场景或实际安全需求灵活选取内容识别方式, 并有利于降低内容识别系统 的负荷, 提高内容识别的效率。
在本实施例技术方案的基础上, 内容管理服务器上可设置内容数据库, 并在内容数据库中存储内容数据信息。 在内容管理服务器的内容数据库中存 储的内容数据信息可包括: 内容的真实属性信息、 篡改记录、 真实水印信息、 真实元数据、 受保护的内容特征值或其他内容数据信息; 真实属性信息包括 内容真实标识和、 内容真实大小信息等, 内容真实标识可为已注册内容的真 实名称或内容管理服务器为已注册内容分配的唯一标识; 篡改记录包括内容 篡改标识、 内容识别时间信息、 内容源地址和目标地址信息等, 内容篡改标 识可为已注册内容的篡改名称或内容管理服务器为已注册内容篡改的标识。 内容管理服务器在获取内容管理客户端选取的识别方式时, 可采取内容管理 客户端选取的识别方式, 并根据内容管理客户端提取的识别信息以及内容数 据库预先存储的内容数据信息识别相应内容的属性。 如果内容管理服务器成 功识别内容, 并向内容管理客户端发送待识别内容的属性信息后, 内容管理 客户端可根据获取的内容属性信息, 对相应内容进行必要的处理, 例如对相 应内容进行过滤、 屏蔽或其他控制处理等。
图 2为本发明一种内容识别方法第二实施例流程图。 本实施例是通过简 单的内容识别方式(基于内容标识识别方式)进行待识别内容的属性识别。 如图 2所示, 本实施例包括:
步骤 21、 内容管理客户端根据内容识别系统的负荷情况或预设的安全需 求选取待识别内容的识别方式为基于内容标识识别方式; 并提取与基于内容 标识识别方式相应的识别信息, 即提取基于内容标识识别方式识别内容所需 的识别信息, 该识别信息可包括待识别内容的内容标识(ID )和第一辅助识 别信息, 第一辅助识别信息可包括内容大小 (S i ze )信息等。
步骤 22、 内容管理客户端向内容管理服务器发送第一内容识别请求, 该 第一内容识别请求中包括内容管理客户端选取的识别方式(基于内容标识识 别方式)和该识别方式(基于内容标识识别方式 )相应的识别信息 (内容标 识、 内容名称和内容大小信息等) , 用于指示内容管理服务器采用该识别方 式并根据该识别信息识别待识别内容的属性。
步骤 23、 内容管理服务器接收内容管理客户端发送的第一内容识别请 求, 根据识别信息中的内容标识( ID ) 查询内容管理服务器上存储的内容真 实属性信息中, 是否有与该内容标识相同的内容真实标识, 如果有, 执行步 骤 24; 如果没有, 执行步骤 27; 其中, 内容管理服务器上存储的内容的真实 属性信息可包括内容真实标识、 内容真实名称和内容真实大小信息等。
步骤 24、 内容管理服务器根据识别信息中的第一辅助识别信息 (内容大 小信息等) , 查询内容管理服务器上存储的相应信息是否相符, 例如: 内容 管理服务器对比识别信息中包括的内容大小信息与存储的内容真实大小信息 是否相符, 如果相符, 执行步骤 25 , 如果不符, 执行步骤 27。
步骤 25、内容管理服务器内容识别成功,获取相应的内容真实属性信息。 步骤 26、 内容管理服务器向内容管理客户端发送识别成功消息, 该识别 成功消息中包括待识别内容的识别结果和属性信息; 结束本流程;
步骤 27、 内容管理服务器向内容管理客户端发送内容识别失败消息; 结
束本流程。
本实施例根据内容识别系统的负荷情况或预设的安全需求 (例如: 内容 识别系统的负荷较高或预设的安全需求等级较低等情形 )选取待识别内容的 识别方式为基于内容标识识别方式, 由内容管理服务器根据基于内容标识识 别方式对待识别内容的属性进行识别, 明显降低内容识别系统的负荷, 提高 内容识别的效率, 适用于对安全识别要求不是非常严格的场景。
图 3为本发明一种内容识别方法第三实施例流程图。 本实施例中, 内容 管理客户端选取了多种 (如: 三种) 内容的识别方式, 内容管理服务器根据 预先设定的执行次序先后采用内容管理客户端选取的识别方式, 识别待识别 内容的属性。 如图 3所示, 本实施例包括:
步骤 31、 内容管理客户端根据内容识别系统的负荷情况或预设的安全需 求选取待识别内容的识别方式, 选取的待识别内容的识别方式为基于内容标 识识别方式、 基于篡改信息识别方式和基于内容元数据识别方式; 并分别提 取每种识别方式对应的识别信息, 即: 对应于基于内容标识识别方式时, 内 容管理客户端提取的与基于内容标识识别方式相应的识别信息包括待识别内 容的内容标识(ID )和第一辅助识别信息, 第一辅助识别信息可包括内容大 小 (S ize )信息等; 对应于基于篡改信息识别方式, 内容管理客户端提取的 与基于篡改信息识别方式相应的识别信息包括内容标识和第二辅助识别信 息, 第二辅助识别信息可包括内容源地址和目标地址( Source/Des t inat ion 地址)信息等; 对应于基于内容元数据识别方式时, 内容管理客户端提取的 与基于内容元数据识别方式相应的识别信息包括待识别内容的元数据
( metadata ),这里的元数据( metadata )主要指内容的哈希值( Content-based hash ) , 即使用 MD5或 SHA系列算法对内容作摘要。
步骤 32、 内容管理客户端向内容管理服务器发送第一内容识别请求, 该 第一内容识别请求中包括内容管理客户端选取的识别方式(基于内容标识识 别方式、 基于篡改信息识别方式和基于内容元数据识别方式) 以及每种识别
方式对应的识别信息。
步骤 33、 内容管理服务器接收第一内容识别请求, 内容管理服务器根据 预先设定的从简单识别方式到复杂识别方式的执行顺序, 依次采取相应的识 别方式识别待识别内容的属性, 即首先采用基于内容标识识别方式以及与基 于内容标识识别方式相应的识别信息进行内容识别: 根据识别信息中的内容 标识(ID )查询内容管理服务器上存储的内容真实属性信息中, 是否有与该 内容标识相同的内容真实标识, 如果有, 采用基于内容标识识别方式识别内 容成功, 执行步骤 36 ; 如果没有, 则采用基于内容标识识别方式识别内容失 败, 执行步骤 34 ; 其中, 内容管理服务器存储的内容的真实属性信息可包括 内容真实标识、 内容真实名称和内容真实大小信息等。
步骤 34、 内容管理服务器采用基于篡改信息识别方式, 根据与基于篡改 信息识别方式对应的识别信息 (内容标识、 内容源地址信息或内容目的地址 信息) 查询内容管理服务器存储的篡改记录中, 是否有与该内容标识匹配的 内容篡改标识; 如果内容管理服务器存储的篡改记录中有与该内容标识匹配 的内容篡改标识, 则进一步查找内容管理服务器存储的篡改记录中, 是否存 在于识别信息中包括的内容源地址信息或内容目标地址信息匹配的相应信 息, 如果内容管理服务器存储的篡改记录中有与识别信息中包括的内容源地 址信息或内容目标地址信息匹配的内容源地址信息或内容目标地址信息, 则 获取该内容篡改标识相应的内容管理服务器存储的内容真实标识, 由于此时 采用基于篡改信息识别方式成功识别待识别内容, 因此, 内容管理服务器不 需要继续采用基于内容元数据识别方式识别内容, 执行步骤 35 ; 如果内容管 理服务器存储的篡改记录中没有与该内容标识匹配的内容篡改标识, 或者, 内容管理服务器存储的篡改记录中没有与识别信息中包括的内容源地址信息 或内容目标地址信息匹配的内容源地址信息或内容目标地址信息, 则采用基 于内容元数据识别方式, 并根据与基于内容元数据识别方式对应的识别信息 (内容的哈希值)识别待识别内容(图 3 中未示出) 。 内容管理服务器存储
的篡改记录可包括内容篡改标识、 内容识别时间信息、 内容源地址和目标地 址信息等。
步骤 35、 内容管理服务器根据内容篡改标识获取内容真实标识, 并根据 该内容真实标识查询内容管理服务器存储的内容的真实属性信息。
步骤 36、 内容管理服务器向内容管理客户端发送识别成功消息, 该识别 成功消息中包括待识别内容的识别结果和属性信息; 结束本流程。
步骤 37、 内容管理服务器向内容管理客户端发送内容识别失败消息; 结 束本流程。
本实施例内容管理客户端可根据内容识别系统的负荷情况或预设的安全 需求选取待识别内容的多种识别方式, 由内容管理服务器根据预先设定的次 序或规则 (如从简单内容识别方式到复杂内容识别方式的执行次序)依次采 取相应的识别方式, 并根据当前采取的内容识别方式对应的识别信息进行内 容识别; 在内容管理服务器采用较简单的内容识别成功识别待识别内容的属 性时, 不再采用内容管理客户端选取的较为复杂的内容识别方式(例如: 虽 然内容管理客户端选取了包括基于篡改信息识别方式、 基于元数据的识别方 式等多种识别方式, 但如果内容管理服务器采用基于篡改信息识别方式成功 识别内容, 则内容管理服务器此时不再需要采用基于元数据识别方式进行内 容识别) , 从而有利于降低内容识别系统的负荷, 提高内容识别的效率。
图 4为本发明一种内容识别方法第四实施例流程图。 本实施例中, 内容 管理服务器根据内容管理客户端选取的识别方式和识别信息进行内容识别失 败后, 可向内容管理客户端发送正确识别指示信息, 用于指示内容管理客户 端补充提供待识别内容相应的内容特征值; 或者内容管理客户端在接收到内 容管理服务器发送的识别失败消息, 并且内容管理客户端原先选取的识别方 式不是基于特征值识别方式时, 内容管理客户端选取新的识别方式。 如图 4 所示, 本实施例包括:
步骤 41、 内容管理客户端根据内容识别系统的负荷情况或预设的安全需
求选取待识别内容的识别方式为基于内容元数据识别方式, 并提取与基于内 容元数据识别方式对应的识别信息, 该识别信息包括元数据(MD5 Va lue ) 、 待识别内容的内容标识( ID ) 、 待识别内容的名称 ( Name )等。
步骤 42、 内容管理客户端向内容管理服务器发送第一内容识别请求, 该 第一内容识别请求中包括内容管理客户端选取的基于内容元数据识别方式以 及与基于内容元数据识别方式对应的识别信息。
步骤 43、 内容管理服务器根据元数据(MD5 Va lue )检索内容管理服务 器上存储的内容数据, 如果内容管理服务器上存储的内容数据中有与元数据 ( MD5 Va lue ) 匹配的真实元数据, 内容管理服务器内容识别成功, 执行步骤 49; 如果内容管理服务器上存储的内容数据中没有与元数据(MD5 Va lue ) 匹 配的真实元数据, 执行步骤 44。
步骤 44、 内容管理服务器向内容管理客户端发送识别失败消息, 返回识 别结果。
内容管理服务器在向内容管理客户端发送识别失败消息时, 还可在识别 失败消息中携带正确识别指示信息, 该正确识别指示信息用于指示内容管理 客户端补充提供待识别内容的识别信息。 可以理解, 内容管理服务器向内容 管理客户端发送的正确识别指示信息还可作为一个独立的消息进行发送。
步骤 45、 内容管理客户端在接收到内容管理服务器发送的识别失败消息 时, 内容管理客户端主动选取待识别内容的识别方式为基于特征值识别方式, 并提取与该基于特征值识别方式相应的识别信息, 即提取待识别内容的特征 值。
如果内容管理服务器在向内容管理客户端发送的识别失败消息中携带正 确识别指示信息或内容管理服务器向内容管理客户端发送作为一个独立消息 发送的正确识别指示信息, 则内容管理客户端根据接收的正确识别指示信息 选取待识别内容的识别方式为基于特征值识别方式, 并提取与该基于特征值 识别方式相应的识别信息, 即提取待识别内容的特征值。
步骤 46、 内容管理客户端向内容管理服务器发送第二内容识别请求, 该 第二内容识别请求中包括新的识别方式(基于特征值识别方式)和补充提取 的识别信息 (待识别内容的特征值) 。
步骤 47、 内容管理服务器接收第二内容识别请求, 采用内容特征值检索 内容管理服务器上存储的受保护的内容数据值中, 是否有与待识别内容的内 容特征值匹配的受保护的内容特征值; 如果有, 内容识别成功,执行步骤 48 ; 如果没有, 内容识别失败, 执行步骤 410。
步骤 48、 内容管理服务器更新篡改记录, 其中, 篡改记录中记录的信息 包括: 内容的真实标识(ID ) 、 被篡改后的各种标识(ID ) 、 识别记录发生 的日期、 识别记录发生的源地址和目标地址等。
步骤 49、 内容管理服务器向内容管理客户端发送识别成功消息, 该识别 成功消息中包括待识别内容的识别结果和属性信息; 结束本流程。
步骤 410、 内容管理服务器向内容管理客户端发送内容识别失败消息; 结束本流程。
本实施例内容管理客户端在选取较为简单的识别方式(如: 基于元数据 识别方式)请求内容管理服务器识别待识别内容时, 如果内容管理服务器内 容识别失败, 内容管理服务器通过发送正确识别指示信息指示内容管理客户 端选取较为复杂的识别方式(如: 基于特征值识别方式)并补充提供所需识 别信息, 或者, 内容管理客户端也可在接收到内容管理服务器发送的识别失 败消息时, 主动发起选取较为复杂的识别方式(如: 基于特征值识别方式) 并补充提取所需识别信息的处理流程, 从而使得内容识别方式更加可靠; 此 夕卜, 由于在内容管理服务器成功识别内容后, 对内容管理服务器上存储的内 容数据的篡改记录进行更新, 使得内容管理客户端对于相同内容进行识别方 式的选择时, 可优先选择基于篡改信息识别方式, 从而有利于降低内容识别 系统的负荷, 提高内容识别的效率。
图 5为本发明一种内容识别方法第五实施例流程图。 本实施例内容管理
客户端选取最复杂识别方式外的多种识别方式, 由内容管理服务器根据识别 方式复杂度的执行次序依次采取相应的识别方式, 当各种识别方式均已采用 但仍未能成功识别内容时, 内容管理客户端可选取最复杂识别方式, 再次发 起内容识别流程。 如图 5所示, 本实施例包括:
步骤 51、 内容管理客户端选取最复杂(或最高层次)识别方式外的其他 各种识别方式, 并分别提取各种识别方式所需的识别信息, 如: 最复杂 (或 最高层次)识别方式为基于特征值识别方式, 内容管理客户端可支持的其他 识别方式包括基于内容标识识别方式、 基于篡改信息识别方式、 基于元数据 识别方式和基于水印识别方式; 相应的, 对应于基于内容标识识别方式时, 内容管理客户端提取的识别信息包括: 待识别内容的内容标识和内容大小信 息等; 对应于基于篡改信息识别方式时, 内容管理客户端提取的识别信息包 括待识别内容的内容标识等; 对应于基于水印识别方式时, 待识别内容中的 数字水印; 对应于基于特征值识别方式时, 内容管理客户端提取的识别信息 包括待识别内容的内容特征值等。
步骤 52、 内容管理客户端向内容管理服务器发送第一内容识别请求消 息, 该第一内容识别请求消息中包括内容管理客户端选取的一种或多种识别 方式以及与每种识别方式对应的识别信息;
步骤 53、 内容管理服务器向根据预先设定的从简单识别方式到复杂识别 方式的执行顺序, 依次采取相应的识别方式识别待识别内容的属性, 即依次 采取基于内容标识识别方式、 基于篡改信息识别方式、 基于元数据识别方式 和基于水印识别方式。 在内容管理服务器采取各种识别方式均未能成功识别 待识别内容时, 执行步骤 55。
如果内容管理服务器采取基于内容标识识别方式未能成功识别待识别内 容时, 可采取基于篡改信息识别方式; 如果内容管理服务器采取基于篡改信 息识别方式未能成功识别带识别内容时, 可采取基于元数据识别方式。 关于 内容管理服务器采取基于内容标识识别方式、 基于篡改信息识别方式和基于
元数据识别方式识别待识别内容的属性的详细记载, 可参见本发明内容识别 方法第一至第四实施例的记载, 不再贅述。 如果内容管理服务器采取基于元 数据识别方式未能成功识别带识别内容时, 可采取基于水印识别方式。 基于 水印识别方式要求内容管理客户端须提前与内容管理服务器协商水印相关信 息, 包括水印嵌入和提取算法, 水印嵌入位置信息等信息。 为了避免内容管 理客户端泄漏水印相关信息, 基于水印识别方式适用于内容管理客户端位于 监控网关、 网站或 SP的场景。 内容管理客户端在步骤 51 中提取内容中包含 的数字水印信息并在步骤 52中发给内容管理服务器。如果因为水印被破坏等 原因, 内容管理客户端在步骤 51中提取水印信息失败, 则内容管理服务器无 法使用该基于水印识别方式进行识别; 如果内容管理客户端成功提取水印信 息并发给了内容管理服务器, 内容管理服务器根据水印信息从内容数据库中 查找是否存储有与提取的水印信息匹配的已存水印, 如果有, 则内容识别成 功; 如果没有, 则内容识别失败。 图 5 中仅示出了内容管理服务器采用基于 内容标识识别方式、 基于篡改信息识别方式、 基于元数据识别方式和基于水 印识别方式均未能成功识别待识别内容的情形。
步骤 54、 内容管理服务器向内容管理客户端发送识别失败消息, 返回识 别结果, 执行步骤 55。
步骤 55、 内容管理客户端在接收到内容管理服务器发送的识别失败消息 时, 选取待识别内容的识别方式为基于特征值识别方式, 并提取与该基于特 征值识别方式相应的识别信息, 即提取待识别内容的特征值。
步骤 56、 内容管理客户端向内容管理服务器发送第二内容识别请求, 该 第二内容识别请求中包括新的识别方式(基于特征值识别方式)和补充提取 的识别信息 (待识别内容的特征值) 。
步骤 57、 内容管理服务器接收第二内容识别请求, 采用内容特征值检索 内容管理服务器上存储的受保护的内容数据值中, 是否有与待识别内容的内 容特征值匹配的受保护的内容特征值; 如果有, 内容识别成功,执行步骤 58 ;
如果没有, 内容识别失败, 执行步骤 510。
步骤 58、 内容管理服务器更新篡改记录, 其中, 篡改记录中记录的信息 包括: 内容的真实标识(ID ) 、 被篡改后的各种标识(ID ) 、 识别记录发生 的日期、 识别记录发生的源地址和目标地址等。
步骤 59、 内容管理服务器向内容管理客户端发送识别成功消息, 该识别 成功消息中包括待识别内容的识别结果和属性信息; 结束本流程。
步骤 510、 内容管理服务器向内容管理客户端发送内容识别失败消息; 结束本流程。
本实施例内容管理客户端在选取多种较为简单的识别方式(如: 基于内 容标识识别方式、 基于篡改信息识别方式、 基于元数据识别方式和基于水印 识别方式)请求内容管理服务器识别待识别内容时, 如果内容管理服务器内 容依次采用各种识别方式均未能成功识别待识别内容时, 内容管理客户端可 在选取较为复杂的识别方式(如: 基于特征值识别方式)并补充提取内容特 征值, 重新发起内容识别流程, 从而使得内容识别方式更加灵活、 可靠; 此 外, 由于在内容管理服务器成功识别内容后, 对内容管理服务器上存储的内 容数据的篡改记录进行更新, 使得内容管理客户端对于相同内容进行识别方 式的选择时,可优先选择较为简单的识别方式(如:基于篡改信息识别方式), 从而有利于降低内容识别系统的负荷, 提高内容识别的效率。
本实施例示出了内容管理客户端向内容管理服务器发送二次内容识别请 求消息的情形。 可以理解, 在本实施例技术方案的基础上, 内容管理客户端 还可发起三次甚至多于三次的内容识别请求, 例如: 内容管理客户端可在内 容管理服务器每次内容失败时, 重新发起一次内容识别请求消息, 并在该次 内容识别请求消息中携带内容管理客户端重新选取的识别方式和补充提取的 识别信息; 直到内容管理客户端接收到内容管理服务器发送内容识别成功消 息, 或者, 直到内容管理客户端已选取了内容识别系统能够支持的最复杂的 识别方式, 但内容管理服务器仍未能成功识别内容时, 内容管理客户端才停
止向内容管理服务器发送内容识别请求。
图 6为本发明另一种内容识别方法实施例流程图。 如图 6所示, 本实施 例包括:
步骤 61、 内容管理服务器根据接收的来自内容管理客户端的第一内容识 别请求, 获取该第一内容识别请求中包括的识别方式和与该识别方式相应的 识别信息。
步骤 62、 内容管理服务器采用第一内容识别请求包括的识别方式, 并根 据第一内容识别请求包括的识别信息以及内容管理服务器上预先存储的内容 数据信息识别待识别内容的属性。
本实施例内容管理服务器通过接收内容管理客户端发送的第一内容识别 请求, 采用第一内容识别请求中包括的识别方式, 并根据第一内容识别请求 中包括的识别信息识别待识别内容的属性, 使得内容管理客户端选取识别方 式和识别信息可分别作为待识别内容识别过程采用的识别方式和识别信息, 有利于内容管理客户端根据内容识别系统的负荷情况或实际安全需求灵活选 取内容识别方式, 从而有利于降低内容识别系统的负荷, 提高内容识别的效 率。 本实施例关于内容管理客户端在第一内容识别请求中包括的识别方式和 各识别方式所需的识别信息, 以及内容管理服务器根据具体的识别方式以及 相应的识别信息进行内容识别的详细描述, 可参见本发明一种内容识别方法 第一实施例至第五实施例、 以及图 1-5的记载, 不再贅述。
图 Ί为本发明内容管理客户端第一实施例的结构示意图。 如图 7所示, 本实施例包括选取模块 71、 提取模块 72和发送模块 73。
选取模块 71用于选取待识别内容的识别方式。
提取模块 72用于提取与选取模块 71选取的识别方式相应的待识别内容 的识别信息。
发送模块 73用于发送第一内容识别请求;该第一内容识别请求中包括选 取模块 71选取的识别方式和提取模块 72提取的识别信息, 用于请求内容管
理服务器采用该识别方式并根据该识别信息识别待识别内容的属性。
本实施例通过选取模块选取待识别内容的识别方式, 提取模块在待识别 内容中提取采用选取的识别方式进行内容识别所需的识别信息, 并通过发送 模块向内容管理服务器发送包括识别方式和识别信息的第一内容识别请求, 指示内容管理服务器采用第一内容识别请求中包括的识别方式, 并根据第一 内容识别请求中包括的识别信息识别待识别内容的属性, 从而实现内容管理 客户端可根据内容识别系统的负荷情况或实际安全需求灵活选取内容识别方 式, 并有利于降低内容识别系统的负荷, 提高内容识别的效率。
图 8为本发明内容管理客户端第二实施例的结构示意图。 本实施例与本 发明内容管理客户端第一实施例的区别在于, 本实施例中, 识别方式包括基 于内容标识识别方式、 基于篡改信息识别方式、 基于内容元数据识别方式、 基于水印识别方式、 基于特征值识别方式或其他识别方式; 识别信息包括待 识别内容的内容标识、 内容大小信息、 内容源地址信息、 内容目标地址信息、 元数据、 数字水印、 内容特征值或采用相应识别方式进行内容识别所需的其 他识别信息。
选取模块 71 具体用于根据内容识别系统的负荷情况或预设的安全需求 选取待识别内容的识别方式为基于内容标识识别方式、 基于篡改信息识别方 式、 基于内容元数据识别方式、 基于水印识别方式或基于特征值识别方式。
提取模块 72至少包括以下单元之一: 内容信息提取单元 721、 篡改信息 提取单元 722、元数据提取单元 723、水印提取单元 724、特征值提取单元 725。
内容信息提取单元 721用于在选取模块 71选取的识别方式为基于内容标 识识别方式时, 提取待识别内容的内容标识和第一辅助识别信息, 所述第一 辅助识别信息包括内容大小信息。
篡改信息提取单元 722用于在选取模块 71选取的识别方式为基于内容元 数据识别方式时, 提取待识别内容的内容标识和第二辅助识别信息, 所述第 二辅助识别信息包括内容源地址信息或内容目标地址信息。
元数据提取单元 723用于在选取模块 71选取的识别方式为基于内容元数 据识别方式时, 提取待识别内容的元数据。
水印提取单元 724用于在选取模块 71选取的识别方式为基于水印识别方 式时, 提取待识别内容中的数字水印。
特征值提取单元 725用于在选取模块 71选取的识别方式为基于特征值识 别方式时, 提取待识别内容的内容特征值。
本实施例选取模块可根据内容识别系统的负荷情况或预设的安全需求灵 活选取待识别内容的识别方式, 提取模块根据选取模块选取的识别方式提取 该识别方式所需的识别信息, 从而有利于降低内容识别系统的负荷, 提高内 容识别的效率。
在本实施例技术方案的基础上, 为了使得内容识别过程更可靠, 选取模 块 71还可用于在接收到内容管理服务器发送的正确识别指示信息时,选取新 的识别方式; 或在接收到内容管理服务器发送的识别失败信息, 并且第一内 容识别请求中包括的识别方式不是基于特征值识别方式时, 选取新的识别方 式。 相应地, 提取模块 72还用于补充提取与新的识别方式相应的识别信息; 而发送模块 73还用于发送第二内容识别请求,第二内容识别请求包括选取模 块 71选取的新的识别方式和提取模块 72补充提取的识别信息。 进一步地, 如果选取模块 71选取了多种识别方式时, 发送模块 73还可用于在第一内容 识别请求消息中封装所述选取模块选取的多个识别方式以及所述提取模块提 取的每个识别方式相应的识别信息, 并发送给内容管理服务器。
本发明内容管理客户端实施例可作为一个独立的设备, 也可作为一个功 能模块集成在监控网关、 用户终端或内容分享网站等实体上。 关于本发明内 容管理客户端实施例实现内容管理的具体方法可参见本发明一种内容识别方 法第一至第五实施例以及图 1-5的记载, 不再贅述。
图 9为本发明内容管理服务器第一实施例的结构示意图。 如图 9所示, 本实施例包括获取模块 91和识别模块 92。
获取模块 91用于根据接收的来自内容管理客户端的第一内容识别请求, 获取第一内容识别请求中包括的识别方式和与该识别方式相应的识别信息。
识别模块 92 用于采用该识别方式并根据该识别信息以及预先存储的内 容数据信息识别待识别内容的属性。
本实施例通过获取模块获取内容管理客户端发送的第一内容识别请求中 包括的识别方式和识别信息, 并通过识别模块采用获取的识别方式和识别信 息识别待识别内容的属性, 使得内容管理客户端选取的识别方式和识别信息 可分别作为识别模块进行待识别内容的识别过程中采用的识别方式和识别信 息, 有利于内容管理客户端根据内容识别系统的负荷情况或实际安全需求灵 活选取内容识别方式, 从而有利于降低内容识别系统的负荷, 提高内容识别 的效率。
图 10为本发明内容管理服务器第二实施例的结构示意图。本实施例与本 发明内容管理服务器第一实施例的区别在于,本实施例还包括内容数据库 93; 识别模块 92至少包括以下单元之一: 内容信息识别单元 921、 篡改信息识别 单元 922、 元数据识别单元 923、 水印识别单元 924、 特征值识别单元 925。
内容数据库 93用于存储内容数据信息;内容数据信息可包括内容的真实 属性信息、 篡改记录、 真实水印信息、 真实元数据、 受保护的内容特征值或 内容的其他信息; 真实属性信息可包括内容真实标识和内容真实大小信息等 信息; 篡改记录可包括内容篡改标识、 内容识别时间信息、 内容源地址和目 标地址信息等信息。
识别模块 92中: 内容信息识别单元 921用于在获取的识别方式为基于内 容标识识别方式, 获取的识别信息包括所述待识别内容的内容标识和第一辅 助识别信息时, 根据所述内容标识查询所述内容数据库存储的内容真实属性 信息中, 是否有与所述内容标识匹配的内容真实标识; 如果有, 比较存储的 所述内容真实标识相应的第一辅助识别信息与所述识别信息中包括的相应信 息是否相符; 如果相符, 发送内容识别成功消息; 所述第一辅助识别信息包
括内容大小信息。
篡改信息识别单元 922 用于在获取的识别方式为基于篡改信息识别方 式,获取的识别信息包括所述待识别内容的内容标识和第二辅助识别信息时, 根据所述内容标识查询存储的篡改记录中, 是否有与所述内容标识匹配的内 容篡改标识; 如果篡改记录中有与所述内容标识匹配的内容篡改标识, 并且 所述第二辅助识别信息与所述篡改记录中的相应信息相符时, 获取所述内容 篡改标识相应的内容真实标识; 根据所述内容真实标识查询所述内容管理服 务器存储的内容的真实属性信息; 所述第二辅助识别信息包括内容源地址信 息或内容目标地址信息。
元数据识别单元 923用于在所述获取模块获取的识别方式为基于内容元 数据识别方式, 获取的识别信息包括所述待识别内容的元数据时, 根据所述 元数据查询所述内容数据库存储的真实元数据中, 是否有与所述元数据匹配 的真实元数据; 如果有, 发送内容识别成功消息。
水印识别单元 924用于在所述获取模块获取的识别方式为基于水印识别 方式, 获取的识别信息包括所述待识别内容中的数字水印时, 根据所述数字 水印查询所述内容数据库存储的真实水印信息中, 是否有与所述元数据匹配 的真实元数据; 如果有, 发送内容识别成功消息。
特征值识别单元 925用于在所述获取模块获取的识别方式为基于特征值 识别方式, 获取的识别信息包括所述待识别内容的内容特征值时, 根据所述 内容特征值查询所述内容数据库存储的受保护的内容特征值中, 是否有与所 述待识别内容的内容特征值匹配的受保护的内容特征值; 如果有, 发送内容 识别成功消息。
本实施例获取模块可根据来自内容管理客户端的第一内容识别请求, 获 取内容管理客户端选取的识别方式和识别信息, 由识别模块根据具体的识别 方式和识别信息, 并更具内容数据库存出的内容数据信息进行内容的识别, 从而使得内容管理客户端可根据网络识别系统的负荷情况或预设的安全需
求, 灵活的选取待识别内容的识别方式, 有利于降低内容识别系统的负荷, 提高内容识别的效率。
在本实施例技术方案的基石出上, 为了提高内容识别的准确性和可靠性, 识别模块还可包括识别指示信息发送单元。 识别指示信息发送单元用于在选 取的识别方式为基于内容标识识别方式、 基于篡改信息识别方式、 基于内容 元数据识别方式或基于水印识别方式, 并且内容识别失败时, 向内容管理客 户端发送正确识别指示信息, 用于指示所述内容管理客户端补充提供所述待 识别内容相应的内容特征值。 如果是内容管理客户端在接收到内容管理服务 器发送的内容识别失败消息时, 主动发起选取新的识别方式并补充提取所需 的识别信息处理流程时,识别模块 92还可用于接收所述内容管理客户端发送 的第二内容识别请求消息, 所述第二内容识别请求消息包括新的识别方式和 补充提供的识别信息; 采用所述新的识别方式并根据所述补充提供的识别信 息, 识别所述待识别内容的属性。
进一步地, 为了提高相同内容的后续识别效率, 在本实施例技术方案的 基础上, 内容管理服务器还可包括更新模块。 更新模块用于在所述获取模块 获取的识别方式为基于内容元数据识别方式、 基于水印识别方式或基于特征 值识别方式, 并且所述识别模块内容识别成功时, 根据所述待识别内容的内 容真实标识、 内容篡改标识、 内容源地址和目标地址信息或内容识别时间信 息, 更新或存储与所述待识别内容相应的篡改记录。 如果内容管理客户端向 内容管理服务器发送的第一内容识别请求中包括多个识别方式时, 识别模块 还可用于在接收的第一内容识别请求消息中包括多个识别方式时, 根据预先 设定的识别方式执行次序, 依次采用相应的识别方式识别所述待识别内容的 属性。
关于本发明内容管理服务器实施例实现内容管理的具体方法可参见本发 明另一种内容识别方法实施例以及图 6的记载, 不再贅述。
图 11为本发明内容识别系统实施例的结构示意图。 如图 11所示, 本实
施例包括内容管理客户端 111和内容管理服务器 112。
内容管理客户端 111用于选取待识别内容的识别方式; 提取与选取的所 述识别方式相应的所述待识别内容的识别信息; 发送第一内容识别请求, 所 述第一内容识别请求中包括选取的所述识别方式和识别信息。
内容管理服务器 112用于根据接收的来自内容管理客户端的第一内容识 别请求, 获取所述第一内容识别请求中包括的识别方式和与所述识别方式相 应的识别信息; 采用所述识别方式, 并根据所述识别信息以及预先存储的内 容数据信息识别待识别内容的属性。
本实施例通过内容管理客户端选取待识别内容的识别方式, 在待识别内 容中提取采用选取的识别方式进行内容识别所需的识别信息, 并向内容管理 服务器发送包括识别方式和识别信息的第一内容识别请求, 由内容管理服务 器采用第一内容识别请求中包括的识别方式, 并根据第一内容识别请求中包 括的识别信息识别待识别内容的属性, 从而实现内容管理客户端可根据内容 识别系统的负荷情况或实际安全需求灵活选取内容识别方式, 并有利于降低 内容识别系统的负荷, 提高内容识别的效率。
关于本发明内容识别系统中, 内容管理客户端的细化功能模块可参见本 发明内容管理客户端实施例以及图 7-8的记载; 内容管理服务器的细化功能 模块可参见本发明内容管理服务器实施例以及图 9-10的记载; 不再贅述。
本领域普通技术人员可以理解: 附图只是一个优选实施例的示意图, 附 图中的模块或流程并不一定是实施本发明所必须的。
本领域普通技术人员可以理解: 实施例中的装置中的模块可以按照实施 例描述分布于实施例的装置中, 也可以进行相应变化位于不同于本实施例的 一个或多个装置中。 上述实施例的模块可以合并为一个模块, 也可以进一步 拆分成多个子模块。
上述本发明实施例序号仅仅为了描述, 不代表实施例的优劣。
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步骤
可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机可读 取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述 的存储介质包括: R0M、 RAM, 磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对其 限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通技术 人员应当理解: 其依然可以对前述实施例所记载的技术方案进行修改, 或者 对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不使相应技术 方案的本质脱离本发明实施例技术方案的精神和范围。
Claims
1、 一种内容识别方法, 其特征在于, 包括:
选取待识别内容的识别方式;
提取与选取的所述识别方式相应的所述待识别内容的识别信息; 发送第一内容识别请求, 所述第一内容识别请求中包括选取的所述识别 方式和识别信息, 用于请求内容管理服务器采用所述识别方式并根据所述识 别信息识别所述待识别内容的属性。
2、 根据权利要求 1所述的内容识别方法, 其特征在于, 所述选取待识别 内容的识别方式, 包括:
根据内容识别系统的负荷情况、 具体应用场景或预先设置的安全需求, 选取待识别内容的识别方式。
3、 根据权利要求 2所述的内容识别方法, 其特征在于, 所述识别方式包 括基于内容标识识别方式、 基于篡改信息识别方式、 基于内容元数据识别方 式、 基于水印识别方式或基于特征值识别方式; 所述识别信息包括待识别内 容的内容标识、 内容大小信息、 内容源地址信息、 内容目标地址信息、 元数 据、 数字水印或内容特征值。
4、 根据权利要求 3所述的内容识别方法, 其特征在于, 所述提取与选取 的所述识别方式相应的所述待识别内容的识别信息, 包括:
在选取的识别方式为基于内容标识识别方式时, 提取所述待识别内容的 内容标识和第一辅助识别信息, 所述第一辅助识别信息包括内容大小信息; 在选取的识别方式为基于篡改信息识别方式时, 提取所述待识别内容的 内容标识和第二辅助识别信息, 所述第二辅助识别信息包括内容源地址信息 或内容目标地址信息;
在选取的识别方式为基于内容元数据识别方式时, 提取所述待识别内容 的元数据;
在选取的识别方式为基于水印识别方式时, 提取所述待识别内容中的数
字水印;
在选取的识别方式为基于特征值识别方式时, 提取所述待识别内容的内 容特征值。
5、 根据权利要求 3所述的内容识别方法, 其特征在于, 所述发送第一内 容识别请求之后, 还包括:
在接收到所述内容管理服务器发送的正确识别指示信息时, 选取新的识 别方式; 或在接收到所述内容管理服务器发送的识别失败信息, 并且所述第 一内容识别请求中包括的识别方式不是基于特征值识别方式时, 选取新的识 别方式;
补充提取与所述新的识别方式相应的识别信息;
向所述内容管理服务器发送第二内容识别请求, 所述第二内容识别请求 包括新的识别方式和补充提取的识别信息。
6、 根据权利要求 3所述的内容识别方法, 其特征在于, 所述发送第一内 容识别请求包括:
在所述第一内容识别请求消息中封装选取的多个识别方式以及每个识别 方式相应的识别信息, 并发送给所述内容管理服务器。
7、 一种内容识别方法, 其特征在于, 包括:
根据接收的来自内容管理客户端的第一内容识别请求, 获取所述第一内 容识别请求中包括的识别方式和与所述识别方式相应的识别信息;
采用所述识别方式, 并根据所述识别信息以及预先存储的内容数据信息 识别待识别内容的属性。
8、 根据权利要求 7所述的内容识别方法, 其特征在于, 还包括: 存储所述内容数据信息; 所述内容数据信息包括内容的真实属性信息、 篡改记录、 真实水印信息、 真实元数据或受保护的内容特征值; 所述真实属 性信息包括内容真实标识和内容大小信息;所述篡改记录包括内容篡改标识、 内容识别时间信息、 内容源地址信息和内容目标地址信息。
9、 根据权利要求 8所述的内容识别方法, 其特征在于, 所述根据识别信 息以及预先存储的内容数据信息识别待识别内容的属性包括:
当获取的识别方式为基于内容标识识别方式, 获取的识别信息包括所述 待识别内容的内容标识和第一辅助识别信息时, 根据所述内容标识查询所述 内容数据库存储的内容真实属性信息中, 是否有与所述内容标识匹配的内容 真实标识; 如果有, 比较存储的所述内容真实标识相应的第一辅助识别信息 与所述识别信息中包括的相应信息是否相符; 如果相符, 发送内容识别成功 消息; 所述第一辅助识别信息包括内容大小信息;
当获取的识别方式为基于篡改信息识别方式, 获取的识别信息包括所述 待识别内容的内容标识和第二辅助识别信息时, 根据所述内容标识查询存储 的篡改记录中, 是否有与所述内容标识匹配的内容篡改标识; 如果篡改记录 中有与所述内容标识匹配的内容篡改标识, 并且所述第二辅助识别信息与所 述篡改记录中的相应信息相符时, 获取所述内容篡改标识相应的内容真实标 识; 根据所述内容真实标识查询所述内容管理服务器存储的内容的真实属性 信息; 所述第二辅助识别信息包括内容源地址信息或内容目标地址信息; 当获取的识别方式为基于内容元数据识别方式, 获取的识别信息包括所 述待识别内容的元数据时, 根据所述元数据查询所述内容数据库存储的真实 元数据中, 是否有与所述元数据匹配的真实元数据; 如果有, 发送内容识别 成功消息;
当获取的识别方式为基于水印识别方式, 获取的识别信息包括所述待识 别内容中的数字水印时, 根据所述数字水印查询所述内容数据库存储的真实 水印信息中, 是否有与所述元数据匹配的真实元数据; 如果有, 发送内容识 别成功消息;
当获取的识别方式为基于特征值识别方式, 获取的识别信息包括所述待 识别内容的内容特征值时, 根据所述内容特征值查询存储的受保护的内容特 征值中,是否有与所述待识别内容的内容特征值匹配的受保护的内容特征值;
如果有, 发送内容识别成功消息。
10、 根据权利要求 9所述的内容识别方法, 其特征在于, 在选取的识别 方式为基于内容标识识别方式、 基于篡改信息识别方式、 基于内容元数据识 别方式或基于水印识别方式, 并且内容识别失败时, 还包括:
向所述内容管理客户端发送正确识别指示信息, 用于指示所述内容管理 客户端补充提供所述待识别内容相应的内容特征值。
11、 根据权利要求 10所述的内容识别方法, 其特征在于, 在向内容管理 客户端发送正确识别指示信息之后, 还包括:
接收所述内容管理客户端发送的第二内容识别请求消息, 所述第二内容 识别请求消息包括新的识别方式和补充提供的识别信息;
采用所述新的识别方式并根据所述补充提供的识别信息, 识别所述待识 别内容的属性。
12、 根据权利要求 9所述的内容识别方法, 其特征在于, 在所述选取的 识别方式为所述基于内容元数据识别方式、 基于水印识别方式或基于特征值 识别方式, 并且内容识别成功时, 还包括:
根据所述待识别内容的内容真实标识、 内容篡改标识、 内容源地址信息、 内容目标地址信息或内容识别时间信息, 更新或存储与所述待识别内容相应 的篡改记录。
1 3、 根据权利要求 9所述的内容识别方法, 其特征在于, 在接收的所述 第一内容识别请求消息中包括多个识别方式时, 根据预先设定的识别方式执 行次序, 依次采用相应的识别方式识别所述待识别内容的属性。
14、 一种内容管理客户端, 其特征在于, 包括:
选取模块, 用于选取待识别内容的识别方式;
提取模块, 用于提取与选取的所述识别方式相应的所述待识别内容的识 别信息;
发送模块, 用于发送第一内容识别请求; 所述第一内容识别请求中包括
选取的所述识别方式和识别信息, 用于请求内容管理服务器采用所述识别方 式并根据所述识别信息识别所述待识别内容的属性。
15、 根据权利要求 14所述的内容管理客户端, 其特征在于, 所述选取模 块还用于根据内容识别系统的负荷情况、 具体应用场景或预先设置的安全需 求, 选取待识别内容的识别方式。
16、 根据权利要求 15所述的内容管理客户端, 其特征在于, 所述识别方 式包括基于内容标识识别方式、 基于篡改信息识别方式、 基于内容元数据识 别方式、 基于水印识别方式或基于特征值识别方式; 所述识别信息包括待识 别内容的内容标识、 内容大小信息、 元数据、 数字水印或内容特征值; 所述 提取模块至少包括以下单元之一:
内容信息提取单元, 用于在所述选取模块选取的识别方式为基于内容标 识识别方式时, 提取所述待识别内容的内容标识和第一辅助识别信息, 所述 第一辅助识别信息包括内容大小信息;
篡改信息提取单元, 用于在所述选取模块选取的识别方式为基于篡改信 息识别方式时, 提取所述待识别内容的内容标识和第二辅助识别信息, 所述 第二辅助识别信息包括内容源地址信息或内容目标地址信息;
元数据提取单元, 用于在所述选取模块选取的识别方式为基于内容元数 据识别方式时, 提取所述待识别内容的元数据;
水印提取单元, 用于在所述选取模块选取的识别方式为基于水印识别方 式时, 提取所述待识别内容中的数字水印;
特征值提取单元, 用于在所述选取模块选取的识别方式为基于特征值识 别方式时, 提取所述待识别内容的内容特征值。
17、 根据权利要求 16所述的内容管理客户端, 其特征在于,
所述选取模块还用于在接收到所述内容管理服务器发送的正确识别指示 信息时, 选取新的识别方式; 或在接收到所述内容管理服务器发送的识别失 败信息, 并且所述第一内容识别请求中包括的识别方式不是基于特征值识别
方式时, 选取新的识别方式;
所述提取模块还用于补充提取与所述新的识别方式相应的识别信息; 所述发送模块还用于发送第二内容识别请求, 所述第二内容识别请求包 括所述选取模块选取的新的识别方式和所述提取模块补充提取的识别信息。
18、 根据权利要求 16所述的内容管理客户端, 其特征在于, 所述发送模 块还用于在所述第一内容识别请求消息中封装所述选取模块选取的多个识别 方式以及所述提取模块提取的每个识别方式相应的识别信息, 并发送。
19、 一种内容管理服务器, 其特征在于, 包括:
获取模块, 用于根据接收的来自内容管理客户端的第一内容识别请求, 获取所述第一内容识别请求中包括的识别方式和与所述识别方式相应的识别 信息;
识别模块, 用于采用所述识别方式, 并根据所述识别信息以及预先存储 的内容数据信息识别待识别内容的属性。
20、 根据权利要求 19所述的内容管理服务器, 其特征在于, 还包括: 内容数据库, 用于存储内容数据信息; 所述内容数据信息包括内容的真 实属性信息、 篡改记录、 真实水印信息、 真实元数据或受保护的内容特征值; 所述真实属性信息包括内容真实标识和内容真实大小信息; 所述篡改记录包 括内容篡改标识、 内容识别时间信息、 内容源地址信息和内容目标地址信息。
21、 根据权利要求 20所述的内容管理服务器, 其特征在于, 所述识别模 块至少包括以下单元之一:
内容信息识别单元, 用于在获取的识别方式为基于内容标识识别方式, 获取的识别信息包括所述待识别内容的内容标识和第一辅助识别信息时, 根 据所述内容标识查询所述内容数据库存储的内容真实属性信息中, 是否有与 所述内容标识匹配的内容真实标识; 如果有, 比较存储的所述内容真实标识 相应的第一辅助识别信息与所述识别信息中包括的相应信息是否相符; 如果 相符, 发送内容识别成功消息; 所述第一辅助识别信息包括内容大小信息;
篡改信息识别单元, 用于在获取的识别方式为基于篡改信息识别方式, 获取的识别信息包括所述待识别内容的内容标识和第二辅助识别信息时, 根 据所述内容标识查询存储的篡改记录中, 是否有与所述内容标识匹配的内容 篡改标识; 如果篡改记录中有与所述内容标识匹配的内容篡改标识, 并且所 述第二辅助识别信息与所述篡改记录中的相应信息相符时, 获取所述内容篡 改标识相应的内容真实标识; 根据所述内容真实标识查询所述内容管理服务 器存储的内容的真实属性信息; 所述第二辅助识别信息包括内容源地址信息 或内容目标地址信息;
元数据识别单元, 用于在所述获取模块获取的识别方式为基于内容元数 据识别方式, 获取的识别信息包括所述待识别内容的元数据时, 根据所述元 数据查询所述内容数据库存储的真实元数据中, 是否有与所述元数据匹配的 真实元数据; 如果有, 发送内容识别成功消息;
水印识别单元, 用于在所述获取模块获取的识别方式为基于水印识别方 式, 获取的识别信息包括所述待识别内容中的数字水印时, 根据所述数字水 印查询所述内容数据库存储的真实水印信息中, 是否有与所述元数据匹配的 真实元数据; 如果有, 发送内容识别成功消息;
特征值识别单元, 用于在所述获取模块获取的识别方式为基于特征值识 别方式, 获取的识别信息包括所述待识别内容的内容特征值时, 根据所述内 容特征值查询所述内容数据库存储的受保护的内容特征值中, 是否有与所述 待识别内容的内容特征值匹配的受保护的内容特征值; 如果有, 发送内容识 别成功消息。
22、 根据权利要求 21所述的内容管理服务器, 其特征在于, 所述识别模 块还包括:
识别指示信息发送单元, 用于在选取的识别方式为基于内容标识识别方 式、基于篡改信息识别方式、基于内容元数据识别方式或基于水印识别方式, 并且内容识别失败时, 向内容管理客户端发送正确识别指示信息, 用于指示
所述内容管理客户端补充提供所述待识别内容相应的内容特征值。
23、 根据权利要求 22所述的内容管理服务器, 其特征在于, 所述识别模 块还用于接收所述内容管理客户端发送的第二内容识别请求消息, 所述第二 内容识别请求消息包括新的识别方式和补充提供的识别信息; 采用所述新的 识别方式并根据所述补充提供的识别信息, 识别所述待识别内容的属性。
24、 根据权利要求 21所述的内容管理服务器, 其特征在于, 还包括: 更新模块, 用于在所述获取模块获取的识别方式为基于内容元数据识别 方式、 基于水印识别方式或基于特征值识别方式, 并且所述识别模块内容识 别成功时, 根据所述待识别内容的内容真实标识、 内容篡改标识、 内容源地 址和目标地址信息或内容识别时间信息, 更新或存储与所述待识别内容相应 的篡改记录。
25、 根据权利要求 21所述的内容管理服务器, 其特征在于, 所述识别模 块还用于在接收的第一内容识别请求消息中包括多个识别方式时, 根据预先 设定的识别方式执行次序, 依次采用相应的识别方式识别所述待识别内容的 属' I"生。
26、 一种内容识别系统, 其特征在于, 包括:
内容管理客户端, 用于选取待识别内容的识别方式; 提取与选取的所述 识别方式相应的所述待识别内容的识别信息; 发送第一内容识别请求, 所述 第一内容识别请求中包括选取的所述识别方式和识别信息;
内容管理服务器, 用于根据接收的来自内容管理客户端的第一内容识别 请求, 获取所述第一内容识别请求中包括的识别方式和与所述识别方式相应 的识别信息; 采用所述识别方式, 并根据所述识别信息以及预先存储的内容 数据信息识别待识别内容的属性。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP09765348.9A EP2275949B1 (en) | 2008-06-19 | 2009-05-04 | Content identification method and system, content management client and server |
US12/537,643 US8527651B2 (en) | 2008-06-19 | 2009-08-07 | Content identification method and system, and SCIDM client and server |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008101152491A CN101610152B (zh) | 2008-06-19 | 2008-06-19 | 内容识别方法和系统以及内容管理客户端和服务器 |
CN200810115249.1 | 2008-06-19 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/537,643 Continuation US8527651B2 (en) | 2008-06-19 | 2009-08-07 | Content identification method and system, and SCIDM client and server |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009152709A1 true WO2009152709A1 (zh) | 2009-12-23 |
Family
ID=41433676
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2009/071626 WO2009152709A1 (zh) | 2008-06-19 | 2009-05-04 | 内容识别方法和系统以及内容管理客户端和服务器 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP2275949B1 (zh) |
CN (1) | CN101610152B (zh) |
WO (1) | WO2009152709A1 (zh) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9865017B2 (en) | 2003-12-23 | 2018-01-09 | Opentv, Inc. | System and method for providing interactive advertisement |
US10387920B2 (en) | 2003-12-23 | 2019-08-20 | Roku, Inc. | System and method for offering and billing advertisement opportunities |
CN101788980A (zh) * | 2009-01-23 | 2010-07-28 | 中兴通讯股份有限公司 | 一种实现内容注册、识别和检索的方法及系统 |
JP5896222B2 (ja) * | 2012-03-21 | 2016-03-30 | ソニー株式会社 | 端末装置、中継装置、情報処理方法、プログラム、およびコンテンツ識別システム |
CN102799804A (zh) * | 2012-04-30 | 2012-11-28 | 珠海市君天电子科技有限公司 | 未知文件安全性综合鉴定方法及系统 |
EP2722808A1 (en) * | 2012-09-17 | 2014-04-23 | OpenTV, Inc. | Automatic localization of advertisements |
CN103973708B (zh) * | 2014-05-26 | 2018-09-07 | 中电长城网际系统应用有限公司 | 一种外泄事件的确定方法和系统 |
CN104486312B (zh) * | 2014-12-04 | 2018-09-04 | 北京奇虎科技有限公司 | 一种应用程序的识别方法和装置 |
CN106844223B (zh) * | 2016-12-20 | 2021-04-09 | 北京大学 | 数据搜索系统及方法 |
US11145365B2 (en) | 2016-12-20 | 2021-10-12 | Peking University | Data search systems and methods |
CN108667715A (zh) * | 2018-03-27 | 2018-10-16 | 北京泰迪熊移动科技有限公司 | 航班信息的识别方法、装置、存储介质及处理器 |
CN109922143A (zh) * | 2019-02-26 | 2019-06-21 | 南威软件股份有限公司 | 一种基于网闸的文件交换的方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1818964A (zh) * | 2001-02-22 | 2006-08-16 | 索尼公司 | 内容提供/获得系统 |
US20060276174A1 (en) * | 2005-04-29 | 2006-12-07 | Eyal Katz | Method and an apparatus for provisioning content data |
CN101072116A (zh) * | 2007-04-28 | 2007-11-14 | 华为技术有限公司 | 业务选择方法、装置、系统及客户端应用服务器 |
US20080066185A1 (en) * | 2006-09-12 | 2008-03-13 | Adobe Systems Incorporated | Selective access to portions of digital content |
US20080091799A1 (en) * | 2006-10-16 | 2008-04-17 | Weaver Ralph J | Access to Internet Content Via Telephone |
CN101176094A (zh) * | 2005-03-31 | 2008-05-07 | 谷歌公司 | 基于来自电子设备的数据获取内容的系统和方法 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004503880A (ja) * | 2000-06-10 | 2004-02-05 | マークエニー・インコーポレイテッド | 電子透かし技術を基盤とする著作物の提供および認証サービスシステムとその方法 |
US7043050B2 (en) * | 2001-05-02 | 2006-05-09 | Microsoft Corporation | Software anti-piracy systems and methods utilizing certificates with digital content |
US6824051B2 (en) * | 2001-06-07 | 2004-11-30 | Contentguard Holdings, Inc. | Protected content distribution system |
JP4289436B1 (ja) * | 2008-03-18 | 2009-07-01 | 日本電気株式会社 | 負荷分散システム及び負荷分散方法 |
-
2008
- 2008-06-19 CN CN2008101152491A patent/CN101610152B/zh active Active
-
2009
- 2009-05-04 WO PCT/CN2009/071626 patent/WO2009152709A1/zh active Application Filing
- 2009-05-04 EP EP09765348.9A patent/EP2275949B1/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1818964A (zh) * | 2001-02-22 | 2006-08-16 | 索尼公司 | 内容提供/获得系统 |
CN101176094A (zh) * | 2005-03-31 | 2008-05-07 | 谷歌公司 | 基于来自电子设备的数据获取内容的系统和方法 |
US20060276174A1 (en) * | 2005-04-29 | 2006-12-07 | Eyal Katz | Method and an apparatus for provisioning content data |
US20080066185A1 (en) * | 2006-09-12 | 2008-03-13 | Adobe Systems Incorporated | Selective access to portions of digital content |
US20080091799A1 (en) * | 2006-10-16 | 2008-04-17 | Weaver Ralph J | Access to Internet Content Via Telephone |
CN101072116A (zh) * | 2007-04-28 | 2007-11-14 | 华为技术有限公司 | 业务选择方法、装置、系统及客户端应用服务器 |
Also Published As
Publication number | Publication date |
---|---|
EP2275949A4 (en) | 2011-06-15 |
EP2275949A1 (en) | 2011-01-19 |
CN101610152B (zh) | 2012-02-01 |
CN101610152A (zh) | 2009-12-23 |
EP2275949B1 (en) | 2014-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2009152709A1 (zh) | 内容识别方法和系统以及内容管理客户端和服务器 | |
US8527651B2 (en) | Content identification method and system, and SCIDM client and server | |
CN110785760B (zh) | 用于登记数字文档的方法和系统 | |
JP6833302B2 (ja) | 情報認証方法及びシステム | |
US8126918B2 (en) | Using embedded data with file sharing | |
TWI277882B (en) | Method and systems for hyperlinking files | |
US20160292396A1 (en) | System and method for authenticating digital content | |
US8099403B2 (en) | Content identification and management in content distribution networks | |
CN101251881B (zh) | 一种内容识别的方法、系统和装置 | |
US20180249190A1 (en) | Method and apparatus for cloud storage and cloud download of multimedia data | |
EP1311973B1 (en) | Using embedded data with file sharing | |
WO2010012175A1 (zh) | 一种文件检测方法和装置 | |
JP2009512309A (ja) | 情報ベースの遠隔透かし検出システム | |
AU2001277047A1 (en) | Using embedded data with file sharing | |
US7519822B2 (en) | Method and apparatus for processing descriptive statements | |
WO2011121927A1 (ja) | デジタルコンテンツ管理システム、装置、プログラムおよび方法 | |
JP2012182737A (ja) | 秘密資料流出防止システム、判定装置、秘密資料流出防止方法およびプログラム | |
CN112163036A (zh) | 区块链信息的构建和查询方法及相关装置 | |
CN115118504A (zh) | 知识库更新方法、装置、电子设备及存储介质 | |
JP2002297541A (ja) | 不正利用通知方法、不正利用通知装置および不正利用通知プログラム | |
WO2021237621A1 (zh) | 一种信息泄露检测方法、装置和计算机可读介质 | |
JP3515738B2 (ja) | コンテンツ情報再構築システムおよびコンテンツ情報再構築用プログラム記録媒体 | |
US11283815B2 (en) | Security measure program, file tracking method, information processing device, distribution device, and management device | |
JP7517723B2 (ja) | 電子署名システム、電子署名方法及び、電子署名プログラム | |
US7827287B2 (en) | Interim execution context identifier |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2009765348 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09765348 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |