CN110851761A - Infringement detection method, device and equipment based on block chain and storage medium - Google Patents

Infringement detection method, device and equipment based on block chain and storage medium Download PDF

Info

Publication number
CN110851761A
CN110851761A CN202010039286.XA CN202010039286A CN110851761A CN 110851761 A CN110851761 A CN 110851761A CN 202010039286 A CN202010039286 A CN 202010039286A CN 110851761 A CN110851761 A CN 110851761A
Authority
CN
China
Prior art keywords
infringement
content
original
work
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010039286.XA
Other languages
Chinese (zh)
Inventor
黄凯明
杨磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010039286.XA priority Critical patent/CN110851761A/en
Publication of CN110851761A publication Critical patent/CN110851761A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/60Digital content management, e.g. content distribution
    • H04L2209/603Digital right managament [DRM]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/101Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00 applying security measures for digital rights management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/50Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Computer Hardware Design (AREA)
  • Multimedia (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Computer Security & Cryptography (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present specification provides a block chain-based infringement detection method, apparatus, device, and storage medium, including: determining the content attribute and the content type of the registered original works; searching at least one resource site matched with the content type of the works of the original works in a preset resource site list, monitoring at least one resource site, and acquiring webpage content of a webpage when the webpage of any resource site is monitored to contain the content attribute of the works of the original works; calculating the similarity of a first clustering result obtained by clustering a word segmentation vector to be detected, which is obtained by carrying out word segmentation on the webpage content, and a second clustering result obtained by clustering an original word segmentation vector obtained by carrying out word segmentation on the original work, and determining the infringement similarity of the original work and the webpage content based on the similarity; and carrying out infringement detection on the webpage content and the original works according to the infringement similarity, and issuing an infringement detection result to the block chain for evidence storage.

Description

Infringement detection method, device and equipment based on block chain and storage medium
Technical Field
One or more embodiments of the present disclosure relate to the field of block chain technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting infringement based on a block chain.
Background
The block chain technology, also called distributed ledger technology, is an emerging technology in which several computing devices participate in "accounting" together, and a complete distributed database is maintained together. The blockchain technology has been widely used in many fields due to its characteristics of decentralization, transparency, participation of each computing device in database records, and rapid data synchronization between computing devices.
Disclosure of Invention
According to a first aspect of the present application, there is provided a block chain-based infringement detection method, including:
determining the work information of the registered original works; the work information comprises the work content attribute and the work content type of the original work;
searching at least one resource site matched with the content type of the works of the original works in a preset resource site list, monitoring the at least one resource site, and acquiring the webpage content of a webpage when the webpage of any resource site is monitored to contain the content attribute of the works of the original works;
calculating the similarity of a first clustering result obtained by clustering word segmentation vectors to be detected, which are obtained by carrying out word segmentation on the webpage content, and a second clustering result obtained by clustering word segmentation vectors to be detected, which are obtained by carrying out word segmentation on the original works, and determining the infringement similarity of the original works and the webpage content based on the similarity;
and carrying out infringement detection on the webpage content and the original works according to the infringement similarity, and issuing an infringement detection result to a block chain for evidence preservation.
Optionally, the method further includes:
and issuing the webpage content acquisition process of the webpage to the block chain for evidence storage.
Optionally, the infringement detection result includes: a calculation result of an infringement detection calculation, and/or a calculation process of an infringement detection calculation.
Optionally, the performing infringement detection on the web page content and the creative work according to the infringement similarity includes:
if the infringement similarity exceeds a preset first threshold, further detecting whether the text similarity of the creative works and the webpage content exceeds a preset second threshold; if yes, determining the webpage content as an infringement product; if not, determining that the webpage content is not an infringement product;
and if the infringement similarity does not exceed a preset first threshold value, determining that the webpage content is not an infringement product.
Optionally, the resource site includes a Web site.
According to a second aspect of the present application, there is provided a block chain-based infringement detection apparatus, the apparatus including:
the determining module is used for determining the work information of the registered original works; the work information comprises the work content attribute and the work content type of the original work;
the acquisition module is used for searching at least one resource site matched with the content type of the creative works in a preset resource site list, monitoring the at least one resource site, and acquiring the webpage content of a webpage when the webpage of any resource site is monitored to contain the content attribute of the creative works;
the calculation module is used for calculating the similarity of a first clustering result obtained by clustering word segmentation vectors to be detected, which are obtained by carrying out word segmentation on the webpage content, and a second clustering result obtained by clustering original word segmentation vectors obtained by carrying out word segmentation on the original works, and determining the infringement similarity between the original works and the webpage content based on the similarity;
and the detection module is used for carrying out infringement detection on the webpage content and the original works according to the infringement similarity and issuing an infringement detection result to the block chain for evidence storage.
Optionally, the detection module issues the web page content acquisition process of the web page to the block chain for evidence storage.
Optionally, the infringement detection result includes: a calculation result of an infringement detection calculation, and/or a calculation process of an infringement detection calculation.
Optionally, if the infringement similarity exceeds a preset first threshold, the detection module further detects whether the text similarity between the original work and the web page content exceeds a preset second threshold; if yes, determining the webpage content as an infringement product; if not, determining that the webpage content is not an infringement product; and if the infringement similarity does not exceed a preset first threshold value, determining that the webpage content is not an infringement product.
Optionally, the resource site includes a Web site.
According to a third aspect of the present specification, there is provided an electronic apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
and the processor executes the executable instructions to realize the block chain-based infringement detection method.
According to a fourth aspect of the present specification, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method for block chain based infringement detection.
According to the description, on one hand, the electronic equipment can actively search the webpage content with the copyright suspicion in the whole network based on the content attribute of the registered original works, so that the timeliness of obtaining the copyright suspicion works can be improved, and the timeliness of determining the copyright suspicion works is further improved.
On the other hand, since the data stored on the block chain has the characteristic of being not falsifiable, the infringement detection result is issued to the block chain for storage, so that the infringement detection result can be prevented from being falsified, and the security of the infringement detection result is ensured.
Drawings
FIG. 1 is a schematic diagram of a process for creating an intelligent contract, as shown in an exemplary embodiment of the present description;
FIG. 2 is a schematic diagram of a call to an intelligent contract, shown in an exemplary embodiment of the present description;
FIG. 3 is a schematic diagram illustrating the creation of an intelligent contract and invocation of an intelligent contract in accordance with an exemplary embodiment of the present specification;
FIG. 4 is a flow chart illustrating a block chain based infringement detection method in an exemplary embodiment of the present description;
FIG. 5 is a diagram illustrating a hardware configuration of an electronic device in accordance with an exemplary embodiment of the present disclosure;
fig. 6 is a block diagram illustrating an infringement detection apparatus based on a block chain according to an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
The application aims to provide a block chain-based infringement detection method, electronic equipment can search webpage content with infringement suspicion in the whole network based on the content attribute of registered original works, carry out infringement detection on the webpage content and the original works, and chain the detection result.
In implementation, the electronic device may search for at least one resource site matching the creative work in a preset resource site list, monitor the searched at least one site, and collect web page content of a web page when it is monitored that a web page of any site of the at least one site contains a work content attribute of the creative work. And the electronic equipment carries out infringement detection calculation on the webpage content and the original works and issues the infringement detection result to a block chain for evidence storage.
On one hand, the electronic equipment can actively search the webpage content with the copyright suspicion in the whole network based on the content attribute of the registered original works, so that the timeliness of obtaining the copyright suspicion works can be improved, and the timeliness of determining the copyright suspicion works is further improved.
On the other hand, since the data stored on the block chain has the characteristic of being not falsifiable, the infringement detection result is issued to the block chain for storage, so that the infringement detection result can be prevented from being falsified, and the security of the infringement detection result is ensured.
In addition, in this application, electronic equipment still can link up the page collection process to not only guaranteed the security and the reliability of infringement testing result, still guaranteed the security and the reliability of the process of collecting evidence (being the page collection process).
Before describing the method for detecting infringement based on a blockchain provided by the present specification, the lower blockchain technique is briefly described.
Blockchains are generally divided into three types: public chain (Public Blockchain), private chain (PrivateBlockchain) and alliance chain (Consortium Blockchain). Furthermore, there may be a combination of the above types, such as private chain + federation chain, federation chain + public chain, and so on.
Among them, the most decentralized is the public chain. The public chain is represented by bitcoin and ether house, and participants (also called nodes in the block chain) joining the public chain can read data records on the chain, participate in transactions, compete for accounting rights of new blocks, and the like. Moreover, each node can freely join or leave the network and perform related operations.
Private chains are the opposite, with the network's write rights controlled by an organization or organization and the data read rights specified by the organization. Briefly, a private chain may be a weakly centralized system with strict restrictions on nodes and a small number of nodes. This type of blockchain is more suitable for use within a particular establishment.
A federation chain is a block chain between a public chain and a private chain, and "partial decentralization" can be achieved. Each node in a federation chain typically has a physical organization or organization corresponding to it; the nodes are authorized to join the network and form a benefit-related alliance, and block chain operation is maintained together.
Based on the basic characteristics of a blockchain, a blockchain is usually composed of several blocks. The time stamps corresponding to the creation time of the block are recorded in the blocks respectively, and all the blocks form a time-ordered data chain according to the time stamps recorded in the blocks strictly.
The real data generated by the physical world can be constructed into a standard transaction (transaction) format supported by a block chain, then is issued to the block chain, the node equipment in the block chain performs consensus processing on the received transaction, and after the consensus is achieved, the node equipment serving as an accounting node in the block chain packs the transaction into a block and performs persistent evidence storage in the block chain.
The consensus algorithm supported in the blockchain may include:
the first kind of consensus algorithm, namely the consensus algorithm that the node device needs to contend for the accounting right of each round of accounting period; consensus algorithms such as Proof of Work (POW), Proof of equity (POS), Proof of commission rights (DPOS), etc.;
the second kind of consensus algorithm, namely the consensus algorithm which elects accounting nodes in advance for each accounting period (without competing for accounting right); for example, a consensus algorithm such as a Practical Byzantine Fault Tolerance (PBFT) is used.
In a blockchain network employing a first type of consensus algorithm, node devices competing for billing rights can execute a transaction upon receipt. One of the node devices competing for the accounting right may win in the process of competing for the accounting right in the current round, and become an accounting node. The accounting node may package the received transaction with other transactions to generate a latest block and send the generated latest block or a block header of the latest block to other node devices for consensus.
In the block chain network adopting the second type of consensus algorithm, the node equipment with the accounting right is agreed before accounting in the current round. Thus, the node device, after receiving the transaction, may send the transaction to the accounting node if it is not the accounting node of its own round. For the accounting node of the current round, the transaction may be performed during or before packaging the transaction with other transactions to generate the latest block. After generating the latest block, the accounting node may send the latest block or a block header of the latest block to other node devices for consensus.
As described above, regardless of which consensus algorithm is used by the blockchain, the accounting node of the current round may pack the received transaction to generate the latest block, and send the generated latest block or the block header of the latest block to other node devices for consensus verification. If no problem is verified after other node equipment receives the latest block or the block header of the latest block, the latest block can be added to the tail of the original block chain, so that the accounting process of the block chain is completed. The transaction contained in the block may also be performed by other nodes in verifying the new block or block header sent by the accounting node.
In practical applications, whether public, private, or alliance, it is possible to provide the functionality of a smart contract (Smartcontract). An intelligent contract on a blockchain is a contract on a blockchain that can be executed triggered by a transaction. An intelligent contract may be defined in the form of code.
Taking an Etherhouse as an example, a user is supported to create and call some complex logic in the Etherhouse network. The ethernet workshop is used as a programmable block chain, and the core of the ethernet workshop is an ethernet workshop virtual machine (EVM), and each ethernet workshop node can run the EVM. The EVM is a well-behaved virtual machine through which various complex logic can be implemented. The user issuing and invoking smart contracts in the etherhouse is running on the EVM. In fact, the EVM directly runs virtual machine code (virtual machine bytecode, hereinafter referred to as "bytecode"), so the intelligent contract deployed on the blockchain may be bytecode. As shown in FIG. 3, after Bob sends a Transaction (Transaction) containing information to create a smart contract to the Ethernet shop network, each node may execute the Transaction in the EVM. In fig. 1, the From field of the transaction is used To record the address of the account initiating the creation of the intelligent contract, the contract code stored in the field value of the Data field of the transaction may be byte code, and the field value of the To field of the transaction is a null account. After the nodes reach the agreement through the consensus mechanism, the intelligent contract is successfully created, and the follow-up user can call the intelligent contract.
After the intelligent contract is established, a contract account corresponding to the intelligent contract appears on the block chain, and the block chain has a specific address; for example, "0 x68e12cf284 …" in each node in fig. 1 represents the address of the contract account created; the contract Code (Code) and account store (Storage) will be maintained in the account store for that contract account. The behavior of the intelligent contract is controlled by the contract code, while the account storage of the intelligent contract preserves the state of the contract. In other words, the intelligent contract causes a virtual account to be generated on the blockchain that contains the contract code and account storage.
As mentioned above, the Data field containing the transaction that created the intelligent contract may hold the byte code of the intelligent contract. A bytecode consists of a series of bytes, each of which can identify an operation. Based on the multiple considerations of development efficiency, readability and the like, a developer can select a high-level language to write intelligent contract codes instead of directly writing byte codes. For example, the high-level language may employ a language such as Solidity, Serpent, LLL, and the like. For intelligent contract code written in a high-level language, the intelligent contract code can be compiled by a compiler to generate byte codes which can be deployed on a blockchain.
Taking the Solidity language as an example, the contract code written by it is very similar to a Class (Class) in the object-oriented programming language, and various members including state variables, functions, function modifiers, events, etc. can be declared in one contract. A state variable is a value permanently stored in an account Storage (Storage) field of an intelligent contract to save the state of the contract.
As shown in FIG. 2, still taking the Etherhouse as an example, after Bob sends a transaction containing the information of the calling intelligent contract to the Etherhouse network, each node can execute the transaction in the EVM. In fig. 2, the From field of the transaction is used To record the address of the account initiating the intelligent contract invocation, the To field is used To record the address of the intelligent contract invocation, and the Data field of the transaction is used To record the method and parameters of the intelligent contract invocation. After invoking the smart contract, the account status of the contract account may change. Subsequently, a client may view the account status of the contract account through the accessed block link point (e.g., node 1 in fig. 2).
The intelligent contract can be independently executed at each node in the blockchain network in a specified mode, and all execution records and data are stored on the blockchain, so that after the transaction is executed, transaction certificates which cannot be tampered and lost are stored on the blockchain.
A schematic diagram of creating an intelligent contract and invoking the intelligent contract is shown in fig. 3. An intelligent contract is created in an Ethernet workshop and needs to be subjected to the processes of compiling the intelligent contract, changing the intelligent contract into byte codes, deploying the intelligent contract to a block chain and the like. The intelligent contract is called in the Ethernet workshop, a transaction pointing to the intelligent contract address is initiated, the EVM of each node can respectively execute the transaction, and the intelligent contract code is distributed and operated in the virtual machine of each node in the Ethernet workshop network.
After the above block chain technology is introduced, the block chain-based infringement detection method provided by the present application is described below.
When the original work and the web page content are text works, the block chain-based infringement detection method shown in fig. 4 may be adopted.
Referring to fig. 4, fig. 4 is a flowchart illustrating a block chain-based infringement detection method that may be applied to an electronic device and may include the following steps according to an exemplary embodiment of the present disclosure.
Step 402: the electronic equipment determines the work information of the registered original works; the work information comprises the work content attribute and the work content type of the original work;
step 404: the method comprises the steps that electronic equipment searches at least one resource site matched with the content type of the works of the original works in a preset resource site list, monitors the at least one resource site, and collects webpage content of a webpage when monitoring that the webpage of any resource site contains the content attribute of the works of the original works;
step 406: the electronic equipment calculates the similarity of a first clustering result obtained by clustering word segmentation vectors to be detected, which are obtained by carrying out word segmentation on the webpage content, and a second clustering result obtained by clustering original word segmentation vectors obtained by carrying out word segmentation on the original works, and determines the infringement similarity of the original works and the webpage content based on the similarity.
Step 408: and the electronic equipment carries out infringement detection on the webpage content and the original works according to the infringement similarity, and issues an infringement detection result to the block chain for evidence storage.
Wherein, the above-mentioned works information is the information related to works, and this works information can include: a work content attribute and a work content type. Of course, the work information may also include other content, which is only illustrated by way of example and is not specifically limited thereto.
The above-mentioned work content attributes may include the author of the creative work, keywords, summary of the work, content type of the work, and the like. The content attribute of the work is merely exemplified and not particularly limited.
The content type of the work is used for expressing the category of the content of the work. Such as the content category of the work being a novel category, a prose category, a news category, a current reviews category, an entertainment category, and so forth. The content type of the work is only exemplified here, and is not particularly limited.
The resource site may include: the website, the application software, various services provided by the website or the application software, such as an applet service, a public number and the like. Here, the resource station is only exemplified and not particularly limited. For example, the resource sites may include Web sites and the like. Here, the resource station is only exemplified and not particularly limited.
In addition, it should be noted that the electronic device may execute the flow from step 402 to step 408 when receiving an infringement detection instruction triggered by a user. Of course, in practical applications, the electronic device may also periodically perform the above-mentioned flow from step 402 to step 408. Here, the infringement detection method provided by the present application is only exemplary to be triggered, and is not particularly limited thereto.
Each of the above steps 402 to 408 will be described in detail below.
Step 402: the electronic equipment determines the work information of the registered original works; the work information comprises the work content attribute and the work content type of the original work.
In implementation, when the creator completes creation of the creative work, the creative work is typically registered with a registration platform. After the registration of the original works is completed by the author, the registration platform can upload the original works and the information of the original works to the original works database.
Wherein, the works information of the original works can include: the content attribute of the original work, the content type of the original work, and the like. Here, the related information of the creative work is only exemplified and not specifically limited.
For example, an author completes the enrollment of a novel work on the enrollment platform. The registration platform generally uploads the novel works, the authors, abstracts, keywords, etc. of the novel works as the content attributes of the works and the content types (such as history class novel) of the works of the novel works, etc. to the original work library.
In determining the work information of the registered creative work, the electronic device may determine a work content attribute and a work content type of the creative work.
1. Determining product attributes for creative products
When the content attribute of the original work is determined, the electronic equipment can search the content attribute of the work corresponding to the original work from the original work library.
And if the content attribute of the work corresponding to the original work is recorded in the original work library, reading the content attribute of the work corresponding to the original work recorded in the original work library.
And if the content attribute of the work corresponding to the original work is not recorded in the original work library, identifying the content attribute of the original work from the original work.
For example, electronic device steps A1 through A2 enable identifying a work content attribute from a creative work:
step A1: the electronic equipment can adopt a preset word segmentation technology to perform word segmentation processing on the original works to obtain a plurality of words.
The preset word segmentation technology may include: a word segmentation method based on dictionary and word bank matching, a word segmentation method based on word frequency statistics, a word segmentation method based on rules and the like.
The word segmentation method based on dictionary and word bank matching comprises the following steps: the method comprises the steps of segmenting a text into a plurality of character strings according to a preset strategy, matching each character string with a dictionary or a word bank, and if matching is successful, determining a word in the dictionary or the word bank matched with the character string as a recognized word segmentation. The word segmentation method based on dictionary and word bank matching can comprise the following steps: MM (Maximum positive Matching Method), RMM (reverse Maximum Matching Method), minimum cut Method, two-way Matching Method, and the like.
The word segmentation method based on word frequency statistics does not depend on a dictionary or a word bank, but counts the frequency of simultaneous occurrence of any two words in a text. The higher the frequency of simultaneous occurrence of two words, the greater the probability that the two words constitute a word. Specifically, the text may be fully segmented, then the frequency of occurrence of adjacent words is counted on the basis of the full segmentation, and the text is segmented based on the frequency. The model for implementing the word segmentation method based on word frequency statistics can comprise the following steps: an N-gram (N-gram), a hidden markov model, and the like, and the model is only exemplary and not particularly limited.
The word segmentation method based on the rules is mainly based on syntax and grammar analysis, combines semantic analysis, and defines words through analysis of information provided by context content, thereby achieving the purpose of word segmentation.
The word segmentation method is only exemplary, and in practical applications, the electronic device may also adopt other word segmentation methods, and the word segmentation method is not specifically limited herein.
And A2, the electronic equipment can screen out a plurality of participles expressing the characteristics of the original works from the obtained participles to be used as the content attributes of the original works.
For example, the electronic device may count the occurrence frequency of each participle in the original work, select the participle with the occurrence frequency higher than a preset threshold value as a keyword, and use the keyword as a plurality of participles expressing the characteristics of the original work. Here, the description is given only by way of example of "selecting a plurality of phrases expressing the characteristics of the original work from the plurality of obtained phrases", and the present invention is not particularly limited thereto.
2. Determining a work content type of a creative work
In an embodiment of the present specification, when obtaining the content type of a creative work, the electronic device may search in the database of creative works for whether the content type of the creative work exists. And if so, reading the content type of the original work. And if the type does not exist, determining the content type of the creative work based on the determined content attribute of the creative work.
When the content type of the creative work is determined based on the determined content attribute of the creative work, the electronic equipment can input the content attribute of the creative work into the trained classification model, and the content type of the creative work is identified based on the content attribute of the creative work through the classification model.
The electronic equipment can receive at least one work content type output by the classification model and the corresponding probability value thereof, and selects the work content type with the maximum probability value as the work content type of the original work. Of course, the electronic device may also select at least one work content type with a probability value greater than a preset threshold value as the work content type of the creative work.
The trained classification model is trained from a large number of sample label pairs. The samples in each sample label pair are the content attribute of the work and the content type of the work.
The trained classification model may be built by a Light Gradient boosting Machine (Light-weight Gradient propulsion Machine), and the deal time prediction model may also be built by a classification model of a Back Propagation (BP) network, a Support Vector Machine (SVM), a logistic regression model, and a random forest, which are not specifically limited herein.
Step 404: the electronic equipment searches for at least one resource site matched with the content type of the works of the original works in a preset resource site list, monitors the at least one resource site, and collects the webpage content of the webpage when monitoring that the webpage of any resource site contains the content attribute of the works of the original works.
In an embodiment of the application, a list of resource sites is maintained on an electronic device. The resource site list maintains resource site identification and the content type of the works contained by the resource site. For example, the list of resource sites can be as shown in Table 1.
Figure 353460DEST_PATH_IMAGE001
Of course, in practical applications, the resource site list further includes other contents, such as a resource site domain name, a resource site priority, and the like, and the resource site list is only illustrated by way of example and is not specifically limited. The resource site identifier may be an address, a URL, or the like of the resource site, and is only exemplarily illustrated and not specifically limited.
It should be noted that each resource site may correspond to one type of work content, or may correspond to multiple types of work content, and is not specifically limited herein.
In this embodiment of the present specification, the electronic device may use the content type of the original work as a key, and search, in the resource site list described in table 1, a resource site identifier corresponding to the key, as at least one resource site matching the content type of the original work.
Certainly, in practical application, the electronic device may further search for at least one resource site matching the original work in a preset resource site list based on information such as a name of the original work, keywords of the original work, and the like. It is to be understood that the description is illustrative only and is not to be construed as limiting in any way.
In an embodiment of the present specification, the electronic device may monitor the at least one resource site by using a web crawler technology, and acquire web page content of a web page when it is monitored that a web page of any resource site of the at least one resource site contains a work content attribute of the creative work.
Step 406: the electronic equipment calculates the similarity of a first clustering result obtained by clustering word segmentation vectors to be detected, which are obtained by carrying out word segmentation on the webpage content, and a second clustering result obtained by clustering original word segmentation vectors obtained by carrying out word segmentation on the original works, and determines the infringement similarity of the original works and the webpage content based on the similarity;
step 408: and the electronic equipment carries out infringement detection on the webpage content and the original works according to the infringement similarity, and issues an infringement detection result to the block chain for evidence storage.
When the webpage content and the original works are both text works, the following infringement detection calculation modes can be adopted to carry out infringement detection calculation on the webpage content and the original works.
The first way of realizing infringement detection calculation is as follows: the electronic equipment can calculate the infringement similarity representing the whole infringement condition of the webpage content and the original works and the text similarity representing the detail infringement condition of the webpage content and the original works, and carry out infringement detection calculation on the webpage content and the original works based on the infringement similarity and the text similarity, which can be seen in steps B1-B5.
B1, the electronic equipment respectively carries out word segmentation processing on the original works and the webpage content, and vectorizes a plurality of words obtained by word segmentation processing to obtain a plurality of original word segmentation vectors corresponding to the original works; and the word segmentation vectors to be detected correspond to the webpage content.
The word segmentation processing method is shown as step a1 above, and is not described here again.
In the embodiment of the present specification, after performing word segmentation processing on an original work and web page content respectively to obtain a plurality of words corresponding to the original work and a plurality of words corresponding to the web page content, the electronic device may further perform vectorization processing on the plurality of words corresponding to the original work and the plurality of words corresponding to the web page content respectively to obtain a plurality of original word segmentation vectors corresponding to the original work; and the word segmentation vectors to be detected correspond to the webpage content.
For example, the electronic device may adopt a Word2vec (Word to Vector, which is a related model for generating Word vectors) technology to perform vectorization processing on a plurality of participles corresponding to the original work and a plurality of participles corresponding to the web page content, respectively, to obtain a plurality of original participle vectors corresponding to the original work; and the word segmentation vectors to be detected correspond to the webpage content.
The technology adopted in the "vectorization processing of the participle" is only exemplarily described here, but of course, in practical applications, the electronic device may also adopt other participle vectorization technologies to implement the vectorization processing of the participle, and is not specifically limited here.
And B2, the electronic equipment carries out clustering processing on the plurality of original word segmentation vectors and the plurality of word segmentation vectors to be detected respectively to obtain a first clustering result corresponding to the original word segmentation vectors and a second clustering result corresponding to the word segmentation vectors to be detected.
In implementation, the electronic device may use a K-Means (a clustering algorithm) method to perform clustering processing on a plurality of original word segmentation vectors and a plurality of word segmentation vectors to be detected, respectively.
Specifically, for example, clustering processing is performed on a plurality of original word segmentation vectors by using a K-Means method to obtain a clustering processing result, and the examples of clustering processing performed on a plurality of word segmentation vectors to be detected by using a K-Means method are the same, and are not described herein again.
The electronic device may pre-select K original segmentation vectors as initial clustering centers. And then dividing a plurality of original word segmentation vectors into a plurality of clusters based on the distance between each original word segmentation direction and the clustering center. Then, the electronic device can recalculate the center of each cluster as a clustering center, and loop the process of dividing a plurality of original participle vectors into a plurality of clusters based on the distance between each original participle direction and the clustering center until the clustering center is not changed or the preset iteration times are reached.
The first clustering result obtained by clustering the original word segmentation vector by the electronic device may include: the distribution of each original word segmentation vector in each cluster, the corresponding cluster center of each cluster, the radius of each cluster, and the like.
Here, the clustering result is merely exemplified and not particularly limited.
Similarly, the second clustering result obtained by clustering the word segmentation vector to be detected by the electronic device may include: the distribution of each word segmentation vector to be detected in each cluster, the corresponding cluster center of each cluster, the radius of each cluster and the like.
Of course, the electronic device may also use other Clustering methods, such as DBSCAN (Density-based Clustering of Applications with Noise, a Density-based Clustering algorithm). The clustering algorithm is only exemplified here, and is not particularly limited.
Step B3: the electronic device can determine a similarity of the first and second clustered results.
When the clustering is achieved, the electronic device can carry out vectorization processing on the first clustering result and the second clustering result respectively to obtain an original result vector corresponding to the first clustering result and a to-be-detected result vector corresponding to the second clustering result. Then, the electronic device may calculate a vector distance of the original result vector and the clustering result vector as a similarity of the first clustering result and the second clustering result.
The original result vector can represent the attribute of each cluster. For example, the original result vector may characterize the cluster center of each cluster, the radius of the cluster, the distribution of the original participle vectors in the cluster (i.e., the distance of the original participle vectors in the cluster from the cluster center), and so on. The original result vector is only exemplified here, and is not particularly limited.
The vector of the result to be detected can represent the attribute of each cluster. For example, the vector of the result to be detected may represent a cluster center of each cluster, a radius of each cluster, a distribution of the segmentation vectors to be detected in the clusters (i.e., a distance between the segmentation vectors to be detected in the clusters and the cluster center), and the like. The original result vector is only exemplified here, and is not particularly limited.
The vector distance may include a euclidean distance, a manhattan distance, a cosine distance, and the like. The vector distance is merely exemplary and not particularly limited.
And B4, the electronic equipment determines the infringement similarity of the creative work and the webpage content based on the similarity of the first clustering result and the second clustering result.
In implementation, in an optional implementation manner, the electronic device may convert the similarity between the first clustering result and the second clustering result into an infringement similarity between the original work and the web page content based on a preset algorithm.
In another optional implementation manner, a corresponding relation table of the clustering result similarity and the infringement similarity is preconfigured on the electronic device, the electronic device may use the similarity of the first clustering result and the second clustering result as a keyword, and in the corresponding relation table, the infringement similarity corresponding to the keyword is searched for as the infringement similarity of the original work and the web page content.
In another alternative implementation manner, the electronic device may directly use the similarity between the first clustering result and the second clustering result as an infringement similarity between the creative work and the web page content.
Here, the example of "the electronic device determines the infringement similarity between the creative work and the web page content based on the similarity between the first clustering result and the second clustering result" is only described, and the implementation of the method is not specifically limited.
And step B5, the electronic equipment can carry out infringement detection on the webpage content according to the infringement similarity.
In an alternative implementation, the electronic device may determine whether the infringement similarity between the creative work and the web content exceeds a preset first threshold. And if the infringement similarity of the original works and the webpage content exceeds a preset first threshold value, determining the webpage content as the infringement works. And if the infringement similarity of the original works and the webpage content does not exceed a preset first threshold, determining that the webpage content is not the infringement work.
Of course, in practical applications, performing the infringement detection based on the clustering result alone may cause false detection, so the accuracy of the infringement detection is improved. And after the electronic equipment detects based on the infringement similarity, further detecting the text similarity of the original works and the webpage content.
When implemented, the electronic device may determine whether the infringement similarity of the creative work and the web content exceeds a preset first threshold. If the infringement similarity exceeds a preset first threshold value, the electronic equipment further calculates the text similarity of the original works and the webpage content, and detects whether the text similarity of the original works and the webpage content exceeds a preset second threshold value. And if the text similarity of the original works and the webpage content exceeds a preset second threshold, determining the webpage content as infringing works. And if the text similarity of the original works and the webpage content does not exceed a preset second threshold, determining that the webpage content is not infringing works.
And if the infringement similarity does not exceed a preset first threshold value, determining that the webpage content is not an infringement product.
The following describes an implementation manner of calculating the text similarity between the creative work and the web page content.
The first method is as follows: the electronic equipment calculates the text similarity of the original works and the webpage content based on a preset Hash algorithm.
In implementation, the electronic device may calculate a first hash value of the creative work and a second hash value of the web content by using a preset hash algorithm. And then the electronic equipment calculates the similarity of the first hash value and the second hash value as the text similarity of the creative work and the webpage content.
The preset hash algorithm adopted in the present specification has the following characteristics: the two works have different individual characters, but the hash values of the two works calculated by the preset hash algorithm are the same. The Hash algorithm with the characteristic can prevent two works with different characters from being judged to have low text similarity, so that false detection of infringement detection is avoided.
The preset hash algorithm may be a simhash algorithm or a minhash algorithm. Of course, in practical applications, the preset hash algorithm may be other hash algorithms as long as the preset hash algorithm meets the above characteristics, and here, the preset hash algorithm is only exemplarily described and is not specifically limited.
The second method comprises the following steps: the electronic device may determine the text similarity of the creative work and the web page content based on a deep learning algorithm calculation.
When the method is implemented, the electronic equipment can input the original works and the webpage content or the characteristics of the original works or the webpage content into a preset similarity calculation model so as to calculate the similarity of the original works and the webpage content through the similarity calculation model. Wherein, the similarity calculation model is trained by a large number of samples.
The similarity calculation model can be a DSSM (Deep Structured Semantic model) model and a Tree Long Short-Term Memory (LSTM) model. Of course, in practical applications, other models may be used, which are only exemplary and not specifically limited.
The third method comprises the following steps: the electronic equipment calculates the text similarity of the creative work and the webpage content based on the word vector space of the creative work and the webpage content.
In implementation, the electronic device may construct a word vector space with the creative work based on all word vectors of the creative work. The electronic device may construct a word vector space with the web page content based on all word vectors of the web page content. The electronic equipment can calculate the similarity between the word vector space of the original work and the word vector space of the webpage content to serve as the text similarity between the original work and the webpage content.
The text similarity calculation is only an exemplary illustration, and in practical application, other ways may be adopted to calculate the text similarity.
And a second mode for realizing infringement detection calculation: the electronic device can calculate the text similarity of the webpage content and the original works. And if the calculated text similarity exceeds a preset threshold value, determining the webpage content infringing original works. And if the calculated text similarity does not exceed the preset threshold, determining that the webpage content does not infringe the original works.
The method for calculating the text similarity between the webpage content and the original works is as described above, and is not repeated here.
In addition, in the embodiment of the present specification, when the original product of the infringement of the web content is testified, not only the calculation result of the infringement detection calculation is usually testified, but also the process of collecting the evidence is safe and reliable, that is, the calculation process of the infringement detection calculation and the process of collecting the web content are reliable and are not tampered. Therefore, in the implementation of the present application, the electronic device needs to issue the calculation result, the calculation process, and other information related to the infringement detection calculation, and the page collection process to the blockchain for verification.
Therefore, in the embodiment of the present application, the infringement detection result may include: a calculation process of infringement detection calculation and a calculation result of infringement detection calculation.
Wherein, the calculation process of the infringement detection calculation comprises the following steps: the intermediate results generated by each step of the detection of the creative works and the web page contents, and the like. Such as the clustering result similarity, infringement similarity, text similarity, etc., of creative works and web page content described below. The calculation process of the infringement detection is only exemplarily described here, and is not particularly limited.
The calculation results of the infringement detection calculation include: whether the webpage content infringes the original works, webpage content identification, original work identification and the like. Here, the infringement detection result is merely exemplified and not particularly limited.
The web page content collecting process may include: the electronic device is a video file recorded in the page capturing process, or a log file generated in the process of capturing the web page content by the electronic device, and the like, and here, a specific representation form of the web page content capturing process is only exemplarily illustrated, and is not specifically limited.
After the electronic equipment acquires the infringement detection result and the webpage content acquisition process, the infringement detection result and the webpage content acquisition process can be issued to the block chain for evidence storage, so that the safety and reliability of the calculation process of infringement detection calculation, the calculation result and the webpage content acquisition process are guaranteed.
Furthermore, it should be noted that, in order to ensure the security of the page collection process and the infringement detection calculation process, the electronic device may execute the page collection process and the infringement detection calculation process in a safety executable environment under the chain.
Of course, the electronic device may also execute the page acquisition process in a security executable environment under the chain, and then perform infringement detection calculation on the original work and the web page content through an intelligent contract which is deployed on the blockchain and performs infringement detection calculation on the original work and the web page content.
When implemented, the creative work is stored on the blockchain; and intelligent contracts for carrying out infringement detection calculation on the original works are deployed on the block chains.
The electronic device may be a node device of a blockchain.
The client side of the block chain can issue an instruction for carrying out infringement detection on the original works, and the electronic equipment responds to the instruction for carrying out infringement detection on the original works issued by the client side, calls infringement detection logic in the intelligent contract deployed on the block chain, and carries out infringement detection calculation on the webpage content and the original works.
The above instruction may be a transaction issued by the client to the blockchain, and the instruction is only exemplary and is not specifically limited.
According to the description, on one hand, the electronic equipment can actively search the webpage content with the copyright suspicion in the whole network based on the content attribute of the registered original works, so that the timeliness of obtaining the copyright suspicion works can be improved, and the timeliness of determining the copyright suspicion works is further improved.
On the other hand, since the data stored on the block chain has the characteristic of being not falsifiable, the infringement detection result is issued to the block chain for storage, so that the infringement detection result can be prevented from being falsified, and the security of the infringement detection result is ensured.
In the third aspect, the electronic device can also link the page acquisition process, thereby not only ensuring the safety and reliability of the infringement detection result, but also ensuring the safety and reliability of the forensics process (namely the page acquisition process).
In addition, in the implementation of the present specification, the web page content and the creative work of the present application may be an image work, an audio-video work, and the like.
1. The infringement detection of the lower web content and the original work as the image work is described below through steps C1 to C3.
Step C1, determining the work information of the registered original works; the work information comprises the work content attribute and the work content type of the original work.
In determining the work information of the registered creative work, the electronic device may determine a work content attribute and a work content type of the creative work.
1) Determining product attributes for creative products
In implementation, the electronic device may search the content attribute of the work corresponding to the creative work from the creative work library.
And if the content attribute of the work corresponding to the original work is recorded in the original work library, reading the content attribute of the work corresponding to the original work recorded in the original work library.
And if the content attribute of the work corresponding to the original work is not recorded in the original work library, identifying the content attribute of the original work from the original work.
When the content attribute of the original work is identified from the original work, the electronic equipment can perform image identification on the image work and identify the content attribute of the original work of the image identification result work.
For example, the creative work is an image including a "dog", and the electronic device may perform image recognition on the creative work. The image recognition result is "dog", and the electronic device can use the "dog" as the content attribute of the original work.
The electronic equipment can collect a deep learning technology to perform image recognition on the image works. For example, the electronic device may input the image work into a trained neural network for image recognition of the image work by the neural network. The Neural network may be a CNN (Convolutional Neural Networks) network. The neural network is only exemplified here and is not particularly limited.
The image recognition method is only exemplified here, and other image recognition methods may be adopted in practical applications, and are not specifically limited here.
2) Determining a work content type of a creative work
When the content type of the original work is obtained, the electronic equipment can search whether the content type of the original work exists in the original work database. And if so, reading the content type of the original work. And if the type does not exist, determining the content type of the creative work based on the determined content attribute of the creative work.
When the content type of the creative work is determined based on the determined content attribute of the creative work, the electronic equipment can input the content attribute of the creative work into the trained classification model, and the content type of the creative work is identified based on the content attribute of the creative work through the classification model.
For example, the creative work is an image including "dog", and the electronic device may perform image recognition on the creative work. The image recognition result is "dog", and the electronic device can use the "dog" as the content attribute of the original work.
The electronic device may input the content attribute of the work (i.e., "dog") to a preset classification model to identify the content type of the work of the creative work based on the content attribute of the work of the creative work. For example, the identified content type of the work is an animal image.
And step C2, the electronic equipment searches at least one resource site matched with the content type of the creative works in a preset resource site list, monitors the at least one resource site, and collects the webpage content of the webpage when monitoring that the webpage of any resource site contains the content attribute of the creative works.
See step 404 above for details, which are not repeated here.
And step C3, the electronic equipment carries out infringement detection calculation on the original works and the webpage content.
In an optional implementation manner, the electronic device may input the web page content and the creative work into a trained neural network to perform image recognition, and obtain an image recognition result corresponding to the web page content and an image recognition result corresponding to the creative work, respectively.
The electronic equipment can calculate the similarity of the creative work and the webpage content based on the image recognition result of the creative work and the image recognition result of the webpage content. And if the calculated similarity is higher than a preset threshold value, determining the webpage content infringing original works. And if the calculated similarity is not higher than a preset threshold value, determining that the webpage content does not infringe the original works.
Here, the infringement detection calculation is merely an exemplary illustration, and is not particularly limited.
In another optional implementation manner, the electronic device extracts at least one feature vector from the creative work and at least one feature vector from the web content, and calculates the similarity between the creative work and the web content based on the feature vector extracted from the creative work and the feature vector extracted from the web content. And if the calculated similarity is higher than a preset threshold value, determining the webpage content infringing original works. And if the calculated similarity is not higher than a preset threshold value, determining that the webpage content does not infringe the original works.
Here, the manner of detecting infringement when the original work and the web page work are images is only described as an example, and is not particularly limited.
2. Infringement detection of the web content and the creative work as an audio work or an audio-visual work is described through steps D1 to D3.
Step D1, determining the work information of the registered original works; the work information comprises the work content attribute and the work content type of the original work.
In determining the work information of the registered creative work, the electronic device may determine a work content attribute and a work content type of the creative work.
1) Determining product attributes for creative products
In implementation, the electronic device may search the content attribute of the work corresponding to the creative work from the creative work library.
And if the content attribute of the work corresponding to the original work is recorded in the original work library, reading the content attribute of the work corresponding to the original work recorded in the original work library.
And if the content attribute of the work corresponding to the original work is not recorded in the original work library, identifying the content attribute of the original work from the original work.
When the content attribute of the original work is identified from the original work, the electronic equipment can convert the audio information in the original work into text information, perform word segmentation processing on the converted text information, and screen out a plurality of words expressing the characteristics of the original work from the obtained words to serve as the content attribute of the original work.
2) Determining a work content type of a creative work
When the content type of the original work is obtained, the electronic equipment can search whether the content type of the original work exists in the original work database. And if so, reading the content type of the original work. And if the type does not exist, determining the content type of the creative work based on the determined content attribute of the creative work.
When the content type of the creative work is determined based on the determined content attribute of the creative work, the electronic equipment can input the content attribute of the creative work into the trained classification model, and the content type of the creative work is identified based on the content attribute of the creative work through the classification model.
For example, the electronic device may extract a number of segments from the text information into which the audio information is converted and input the number of segments into a preset classification model, so that the classification model identifies the content type of the original work based on the content attribute of the original work.
And D2, the electronic equipment searches at least one resource site matched with the content type of the creative works in a preset resource site list, monitors the at least one resource site, and acquires the webpage content of the webpage when monitoring that the webpage of any resource site contains the content attribute of the creative works.
See step 404 above for details, which are not repeated here.
And D3, the electronic equipment carries out infringement detection calculation on the original works and the webpage content.
For example, when the original work and the web page content are audio works or audio and video works, the electronic device can respectively input the web page content and the original work into a trained neural network for voiceprint recognition, and a voiceprint recognition result corresponding to the web page content and a voiceprint recognition result corresponding to the original work are respectively obtained.
The electronic equipment can calculate the similarity of the original works and the webpage content based on the voiceprint recognition results of the original works and the voiceprint recognition results of the webpage content. And if the calculated similarity is higher than a preset threshold value, determining the webpage content infringing original works. And if the calculated similarity is not higher than a preset threshold value, determining that the webpage content does not infringe the original works.
For example, when the creative work and the web page content are video works, the electronic device may extract a plurality of designated first video frames in the creative work and a plurality of designated second video frames in the web page content, and calculate similarities of the plurality of first video frames and the plurality of second video frames, respectively. The electronic equipment can calculate the similarity of the creative work and the webpage content based on the similarity of the first video frames and the second video frames. And if the calculated similarity is higher than a preset threshold value, determining the webpage content infringing original works. And if the calculated similarity is not higher than a preset threshold value, determining that the webpage content does not infringe the original works.
The designated first video frame and the designated second video frame are video frames capable of reflecting the content of the original works and the content of the webpage respectively. For example, it may be a preset video frame, such as extracting one frame every 10 frames as a designated video frame, etc. And is not particularly limited herein.
When the similarity between the creative work and the web page content is calculated based on the similarity between the first video frames and the second video frames, the electronic device may calculate the similarity between the creative work and the web page content based on the similarity between each of the first video frames and the second video frames and the weight corresponding to the similarity. And is not particularly limited herein.
It should be noted that, here, the infringement detection calculation is only an exemplary illustration, and is not specifically limited thereto.
Corresponding to the embodiment of the method for detecting infringement based on the block chain, the present specification also provides an embodiment of a device for detecting infringement based on the block chain.
Corresponding to the embodiment of the method for detecting infringement based on the block chain, the present specification also provides an embodiment of a device for detecting infringement based on the block chain. The embodiment of the block chain-based infringement detection apparatus in the present specification can be applied to an electronic device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. From a hardware aspect, as shown in fig. 5, the hardware structure diagram of an electronic device in which an intrusion detection apparatus based on a block chain according to this specification is located is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, the electronic device in which the apparatus is located in the embodiment may also include other hardware according to an actual function of the electronic device, which is not described again.
Referring to fig. 6, fig. 6 is a block diagram illustrating an infringement detection apparatus based on a block chain according to an exemplary embodiment of the present disclosure. The device can be applied to electronic equipment and can comprise:
a determination module 601, which determines the work information of the registered original works; the work information comprises the work content attribute and the work content type of the original work;
the acquisition module 602 searches for at least one resource site matched with the type of the content of the creative work in a preset resource site list, monitors the at least one resource site, and acquires the webpage content of a webpage when monitoring that the webpage of any resource site contains the content attribute of the creative work;
the calculation module 603 is configured to calculate similarity between a first clustering result obtained by clustering word segmentation vectors to be detected, which are obtained by performing word segmentation on the web page content, and a second clustering result obtained by clustering original word segmentation vectors obtained by performing word segmentation on the original works, and determine infringement similarity between the original works and the web page content based on the similarity;
the detection module 604 is configured to perform infringement detection on the web page content and the creative work according to the infringement similarity, and issue an infringement detection result to the block chain for evidence storage.
Optionally, the detecting module 604 issues the web page content collecting process of the web page to the block chain for evidence storage.
Optionally, the infringement detection result includes: a calculation result of an infringement detection calculation, and/or a calculation process of an infringement detection calculation.
Optionally, the detecting module 604 further detects whether the text similarity between the original work and the web page content exceeds a preset second threshold if the infringement similarity exceeds a preset first threshold; if yes, determining the webpage content as an infringement product; if not, determining that the webpage content is not an infringement product; and if the infringement similarity does not exceed a preset first threshold value, determining that the webpage content is not an infringement product.
Optionally, the resource site includes a Web site.
Further, the present specification provides an electronic device comprising: a processor; a memory for storing processor-executable instructions; and the processor executes the executable instructions to realize the block chain-based infringement detection method.
Further, the present specification provides a computer readable storage medium having stored thereon computer instructions, wherein the instructions, when executed by a processor, implement a block chain based infringement detection method.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims (12)

1. A block chain based infringement detection method, the method comprising:
determining the work information of the registered original works; the work information comprises the work content attribute and the work content type of the original work;
searching at least one resource site matched with the content type of the works of the original works in a preset resource site list, monitoring the at least one resource site, and acquiring the webpage content of a webpage when the webpage of any resource site is monitored to contain the content attribute of the works of the original works;
calculating the similarity of a first clustering result obtained by clustering word segmentation vectors to be detected, which are obtained by carrying out word segmentation on the webpage content, and a second clustering result obtained by clustering word segmentation vectors to be detected, which are obtained by carrying out word segmentation on the original works, and determining the infringement similarity of the original works and the webpage content based on the similarity;
and carrying out infringement detection on the webpage content and the original works according to the infringement similarity, and issuing an infringement detection result to a block chain for evidence preservation.
2. The method of claim 1, further comprising:
and issuing the webpage content acquisition process of the webpage to the block chain for evidence storage.
3. The method of claim 1, the infringement detection result comprising: a calculation result of an infringement detection calculation, and/or a calculation process of an infringement detection calculation.
4. The method of claim 1, wherein the infringing detection of the web page content and the creative work according to the infringement similarity comprises:
if the infringement similarity exceeds a preset first threshold, further detecting whether the text similarity of the creative works and the webpage content exceeds a preset second threshold; if yes, determining the webpage content as an infringement product; if not, determining that the webpage content is not an infringement product;
and if the infringement similarity does not exceed a preset first threshold value, determining that the webpage content is not an infringement product.
5. The method of claim 1, the resource site comprising a Web site.
6. An apparatus for block chain based infringement detection, the apparatus comprising:
the determining module is used for determining the work information of the registered original works; the work information comprises the work content attribute and the work content type of the original work;
the acquisition module is used for searching at least one resource site matched with the content type of the creative works in a preset resource site list, monitoring the at least one resource site, and acquiring the webpage content of a webpage when the webpage of any resource site is monitored to contain the content attribute of the creative works;
the calculation module is used for calculating the similarity of a first clustering result obtained by clustering word segmentation vectors to be detected, which are obtained by carrying out word segmentation on the webpage content, and a second clustering result obtained by clustering original word segmentation vectors obtained by carrying out word segmentation on the original works, and determining the infringement similarity between the original works and the webpage content based on the similarity;
and the detection module is used for carrying out infringement detection on the webpage content and the original works according to the infringement similarity and issuing an infringement detection result to the block chain for evidence storage.
7. The apparatus of claim 6, the detection module to issue a web content collection procedure for the web page to the blockchain for credentialing.
8. The apparatus of claim 6, the infringement detection result comprising: a calculation result of an infringement detection calculation, and/or a calculation process of an infringement detection calculation.
9. The device of claim 6, wherein the detection module further detects whether the text similarity between the creative work and the web content exceeds a preset second threshold if the infringement similarity exceeds a preset first threshold; if yes, determining the webpage content as an infringement product; if not, determining that the webpage content is not an infringement product; and if the infringement similarity does not exceed a preset first threshold value, determining that the webpage content is not an infringement product.
10. The apparatus of claim 6, the resource site comprising a Web site.
11. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any one of claims 1-5 by executing the executable instructions.
12. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 5.
CN202010039286.XA 2020-01-15 2020-01-15 Infringement detection method, device and equipment based on block chain and storage medium Pending CN110851761A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010039286.XA CN110851761A (en) 2020-01-15 2020-01-15 Infringement detection method, device and equipment based on block chain and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010039286.XA CN110851761A (en) 2020-01-15 2020-01-15 Infringement detection method, device and equipment based on block chain and storage medium

Publications (1)

Publication Number Publication Date
CN110851761A true CN110851761A (en) 2020-02-28

Family

ID=69610723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010039286.XA Pending CN110851761A (en) 2020-01-15 2020-01-15 Infringement detection method, device and equipment based on block chain and storage medium

Country Status (1)

Country Link
CN (1) CN110851761A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737522A (en) * 2020-08-14 2020-10-02 支付宝(杭州)信息技术有限公司 Video matching method, and block chain-based infringement evidence-saving method and device
CN112000929A (en) * 2020-07-29 2020-11-27 广州智城科技有限公司 Cross-platform data analysis method, system, equipment and readable storage medium
CN112000928A (en) * 2020-07-15 2020-11-27 西安电子科技大学 Picture distributed infringement right confirming method, system, storage medium and computer equipment
CN112182329A (en) * 2020-09-14 2021-01-05 浙江数秦科技有限公司 Network picture infringement monitoring and automatic evidence obtaining method
CN112650978A (en) * 2020-08-14 2021-04-13 支付宝(杭州)信息技术有限公司 Infringement detection method and device based on block chain and electronic equipment
CN113935850A (en) * 2021-10-19 2022-01-14 平安普惠企业管理有限公司 Data processing method and device, computer equipment and storage medium
CN114359590A (en) * 2021-12-06 2022-04-15 支付宝(杭州)信息技术有限公司 NFT image work infringement detection method and device and computer storage medium
CN115905913A (en) * 2022-10-14 2023-04-04 支付宝(杭州)信息技术有限公司 Method and device for detecting digital collection

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160246978A1 (en) * 2015-02-23 2016-08-25 Samsung Electronics Co., Ltd. Electronic Device and Method for Providing DRM Content by Electronic Device
CN106446148A (en) * 2016-09-21 2017-02-22 中国运载火箭技术研究院 Cluster-based text duplicate checking method
CN107832384A (en) * 2017-10-28 2018-03-23 北京安妮全版权科技发展有限公司 Infringement detection method, device, storage medium and electronic equipment
CN108804641A (en) * 2018-06-05 2018-11-13 鼎易创展咨询(北京)有限公司 A kind of computational methods of text similarity, device, equipment and storage medium
CN110472201A (en) * 2019-07-26 2019-11-19 阿里巴巴集团控股有限公司 Based on the text similarity detection method and device of block chain, electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160246978A1 (en) * 2015-02-23 2016-08-25 Samsung Electronics Co., Ltd. Electronic Device and Method for Providing DRM Content by Electronic Device
CN106446148A (en) * 2016-09-21 2017-02-22 中国运载火箭技术研究院 Cluster-based text duplicate checking method
CN107832384A (en) * 2017-10-28 2018-03-23 北京安妮全版权科技发展有限公司 Infringement detection method, device, storage medium and electronic equipment
CN108804641A (en) * 2018-06-05 2018-11-13 鼎易创展咨询(北京)有限公司 A kind of computational methods of text similarity, device, equipment and storage medium
CN110472201A (en) * 2019-07-26 2019-11-19 阿里巴巴集团控股有限公司 Based on the text similarity detection method and device of block chain, electronic equipment

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000928A (en) * 2020-07-15 2020-11-27 西安电子科技大学 Picture distributed infringement right confirming method, system, storage medium and computer equipment
CN112000929A (en) * 2020-07-29 2020-11-27 广州智城科技有限公司 Cross-platform data analysis method, system, equipment and readable storage medium
CN111737522A (en) * 2020-08-14 2020-10-02 支付宝(杭州)信息技术有限公司 Video matching method, and block chain-based infringement evidence-saving method and device
CN112650978A (en) * 2020-08-14 2021-04-13 支付宝(杭州)信息技术有限公司 Infringement detection method and device based on block chain and electronic equipment
WO2022033252A1 (en) * 2020-08-14 2022-02-17 支付宝(杭州)信息技术有限公司 Video matching method and apparatus, and blockchain-based infringement evidence storage method and apparatus
US11954152B2 (en) 2020-08-14 2024-04-09 Alipay (Hangzhou) Information Technology Co., Ltd. Video matching methods and apparatuses, and blockchain-based infringement evidence storage methods and apparatuses
CN112182329A (en) * 2020-09-14 2021-01-05 浙江数秦科技有限公司 Network picture infringement monitoring and automatic evidence obtaining method
CN112182329B (en) * 2020-09-14 2023-04-18 浙江数秦科技有限公司 Network picture infringement monitoring and automatic evidence obtaining method
CN113935850A (en) * 2021-10-19 2022-01-14 平安普惠企业管理有限公司 Data processing method and device, computer equipment and storage medium
CN114359590A (en) * 2021-12-06 2022-04-15 支付宝(杭州)信息技术有限公司 NFT image work infringement detection method and device and computer storage medium
CN115905913A (en) * 2022-10-14 2023-04-04 支付宝(杭州)信息技术有限公司 Method and device for detecting digital collection
CN115905913B (en) * 2022-10-14 2024-03-12 支付宝(杭州)信息技术有限公司 Method and device for detecting digital collection

Similar Documents

Publication Publication Date Title
CN110851761A (en) Infringement detection method, device and equipment based on block chain and storage medium
CN107622333B (en) Event prediction method, device and system
WO2023124204A1 (en) Anti-fraud risk assessment method and apparatus, training method and apparatus, and readable storage medium
CN109978060B (en) Training method and device of natural language element extraction model
Jain et al. Machine Learning based Fake News Detection using linguistic features and word vector features
CN110851608A (en) Infringement detection method, device and equipment based on block chain and storage medium
CN111460783B (en) Data processing method and device, computer equipment and storage medium
CN115203440A (en) Event map construction method and device for time-space dynamic data and electronic equipment
CN110851797A (en) Block chain-based work creation method and device and electronic equipment
Hu et al. Deep self-taught learning for detecting drug abuse risk behavior in tweets
CN112132238A (en) Method, device, equipment and readable medium for identifying private data
KR20180129001A (en) Method and System for Entity summarization based on multilingual projected entity space
CN115188067A (en) Video behavior identification method and device, electronic equipment and storage medium
CN109271624A (en) A kind of target word determines method, apparatus and storage medium
Almaguer-Angeles et al. Choosing machine learning algorithms for anomaly detection in smart building iot scenarios
Nirav Shah et al. A systematic literature review and existing challenges toward fake news detection models
Hanshal et al. RETRACTED ARTICLE: Hybrid deep learning model for automatic fake news detection
Villanueva et al. Application of Natural Language Processing for Phishing Detection Using Machine and Deep Learning Models
CN114357203B (en) Multimedia retrieval method and device and computer equipment
CN115563296A (en) Fusion detection method and system based on content semantics
CN110198309A (en) A kind of Web server recognition methods, device, terminal and storage medium
Bhoj et al. LSTM powered identification of clickbait content on entertainment and news websites
Kasnesis et al. A prototype deep learning paraphrase identification service for discovering information cascades in social networks
Fernández-Pedauye et al. Enhancing the spaCy named entity recognizer for crowdsensing
CN117909505B (en) Event argument extraction method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40024788

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20200228

RJ01 Rejection of invention patent application after publication