CN110457434B - Webpage evidence obtaining method and device based on search, readable storage medium and server - Google Patents

Webpage evidence obtaining method and device based on search, readable storage medium and server Download PDF

Info

Publication number
CN110457434B
CN110457434B CN201910652647.5A CN201910652647A CN110457434B CN 110457434 B CN110457434 B CN 110457434B CN 201910652647 A CN201910652647 A CN 201910652647A CN 110457434 B CN110457434 B CN 110457434B
Authority
CN
China
Prior art keywords
webpage
text
evidence
picture
evidence obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910652647.5A
Other languages
Chinese (zh)
Other versions
CN110457434A (en
Inventor
胡俊
盛思思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910652647.5A priority Critical patent/CN110457434B/en
Priority to PCT/CN2019/118134 priority patent/WO2021012521A1/en
Publication of CN110457434A publication Critical patent/CN110457434A/en
Application granted granted Critical
Publication of CN110457434B publication Critical patent/CN110457434B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of computers, and particularly relates to a webpage evidence obtaining method and device based on searching, a computer readable storage medium and a server. The method comprises the steps of receiving a webpage evidence obtaining request sent by terminal equipment, wherein the webpage evidence obtaining request comprises an evidence obtaining object to be searched, and the evidence obtaining object comprises a text object or a picture object; extracting the evidence obtaining object from the webpage evidence obtaining request, and judging the type of the evidence obtaining object; if the evidence obtaining object is a text object, searching a first webpage evidence in a network through a search engine, and collecting an image of a webpage where the first webpage evidence is located; if the evidence obtaining object is a picture object, searching a second webpage evidence in the network through a search engine, and collecting an image of a webpage where the second webpage evidence is located. By the embodiment of the invention, the user can be helped to actively obtain evidence in massive network information, infringement behavior can be found timely, and legal benefits of the user are comprehensively ensured.

Description

Webpage evidence obtaining method and device based on search, readable storage medium and server
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a webpage evidence obtaining method and device based on searching, a computer readable storage medium and a server.
Background
With the popularity of internet technology, more and more information content is transferred from flat printed matters to web pages of the internet, and a great deal of evidence content for lawsuits is contained in massive web page information, for example, products sold by merchants in online shops of the internet may be counterfeit or infringed products, articles published on certain websites may be content of other authors and the like, when the user finds out the evidence, the evidence can be collected by means of screenshot or photographing and the like for subsequent litigation or right maintenance, but the evidence is a passive evidence collection method, which is generally adopted when infringement behaviors are found by the user accidentally, is extremely accidental, and is difficult to timely and comprehensively maintain legal interests of the user.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a message pushing method, a computer readable storage medium, and a server, so as to solve the problem that the conventional passive evidence obtaining method is extremely accidental, and is difficult to timely and comprehensively maintain legal interests of users.
A first aspect of an embodiment of the present invention provides a search-based webpage evidence obtaining method, which may include:
receiving a webpage evidence obtaining request sent by terminal equipment, wherein the webpage evidence obtaining request comprises an evidence obtaining object to be searched, and the evidence obtaining object comprises a text object or a picture object;
extracting the evidence obtaining object from the webpage evidence obtaining request, and judging the type of the evidence obtaining object;
if the evidence obtaining object is a text object, searching a first webpage evidence in a network through a preset search engine, and collecting an image of a webpage where the first webpage evidence is located, wherein the first webpage evidence is a webpage text with a text similarity with the text object being greater than a preset first threshold;
if the evidence obtaining object is a picture object, searching a second webpage evidence in a network through the search engine, and collecting an image of a webpage where the second webpage evidence is located, wherein the second webpage evidence is a webpage picture with the image similarity with the picture object being larger than a preset second threshold value.
A second aspect of an embodiment of the present invention provides a web page evidence obtaining apparatus, which may include:
the system comprises a evidence obtaining request receiving module, a terminal device and a search module, wherein the evidence obtaining request receiving module is used for receiving a webpage evidence obtaining request sent by the terminal device, the webpage evidence obtaining request comprises an evidence obtaining object to be searched, and the evidence obtaining object comprises a text object or a picture object;
The evidence obtaining object extracting module is used for extracting the evidence obtaining object from the webpage evidence obtaining request and judging the type of the evidence obtaining object;
the first search module is used for searching first webpage evidence in a network through a preset search engine if the evidence obtaining object is a text object, and collecting images of webpages where the first webpage evidence is located, wherein the first webpage evidence is a webpage text with the text similarity with the text object being greater than a preset first threshold;
and the second search module is used for searching second webpage evidence in a network through the search engine if the evidence obtaining object is a picture object, and collecting images of webpages where the second webpage evidence is located, wherein the second webpage evidence is a webpage picture with the image similarity with the picture object being larger than a preset second threshold value.
A third aspect of embodiments of the present invention provides a computer readable storage medium storing computer readable instructions which when executed by a processor perform the steps of:
receiving a webpage evidence obtaining request sent by terminal equipment, wherein the webpage evidence obtaining request comprises an evidence obtaining object to be searched, and the evidence obtaining object comprises a text object or a picture object;
Extracting the evidence obtaining object from the webpage evidence obtaining request, and judging the type of the evidence obtaining object;
if the evidence obtaining object is a text object, searching a first webpage evidence in a network through a preset search engine, and collecting an image of a webpage where the first webpage evidence is located, wherein the first webpage evidence is a webpage text with a text similarity with the text object being greater than a preset first threshold;
if the evidence obtaining object is a picture object, searching a second webpage evidence in a network through the search engine, and collecting an image of a webpage where the second webpage evidence is located, wherein the second webpage evidence is a webpage picture with the image similarity with the picture object being larger than a preset second threshold value.
A fourth aspect of the embodiments of the present invention provides a server comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, the processor executing the computer readable instructions to perform the steps of:
receiving a webpage evidence obtaining request sent by terminal equipment, wherein the webpage evidence obtaining request comprises an evidence obtaining object to be searched, and the evidence obtaining object comprises a text object or a picture object;
Extracting the evidence obtaining object from the webpage evidence obtaining request, and judging the type of the evidence obtaining object;
if the evidence obtaining object is a text object, searching a first webpage evidence in a network through a preset search engine, and collecting an image of a webpage where the first webpage evidence is located, wherein the first webpage evidence is a webpage text with a text similarity with the text object being greater than a preset first threshold;
if the evidence obtaining object is a picture object, searching a second webpage evidence in a network through the search engine, and collecting an image of a webpage where the second webpage evidence is located, wherein the second webpage evidence is a webpage picture with the image similarity with the picture object being larger than a preset second threshold value.
Compared with the prior art, the embodiment of the invention has the beneficial effects that: the embodiment of the invention presets a server (a evidence obtaining server, namely an implementation subject of the embodiment) for webpage evidence obtaining, and a user can send a webpage evidence obtaining request to the evidence obtaining server through terminal equipment so that the evidence obtaining server automatically searches webpage evidence containing content which possibly infringes the user rights in a network. For example, if the object that the user wants to protect is a text object, the text object may be carried in the webpage evidence obtaining request, the evidence obtaining server may automatically search the webpage text that is relatively similar to the text object in the network and collect the image of the webpage where the evidence obtaining server locates as the webpage evidence, and if the object that the user wants to protect is a picture object, the picture object may be carried in the webpage evidence obtaining request, the evidence obtaining server may automatically search the webpage picture that is relatively similar to the picture object in the network and collect the image of the webpage where the evidence obtaining server locates as the webpage evidence. By the embodiment of the invention, the user can be helped to actively obtain evidence in massive network information, infringement behavior can be found timely, and legal benefits of the user are comprehensively ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of one embodiment of a search-based web page evidence obtaining method in accordance with an embodiment of the present invention;
FIG. 2 is a schematic flow diagram of searching a network for evidence of a first web page by a search engine;
FIG. 3 is a schematic flow chart of searching for evidence of a second web page in a network by a search engine;
FIG. 4 is a diagram illustrating an exemplary configuration of a web page evidence obtaining apparatus according to an exemplary embodiment of the present invention;
fig. 5 is a schematic block diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, an embodiment of a search-based webpage evidence obtaining method according to an embodiment of the present invention may include:
step S101, a webpage evidence obtaining request sent by a terminal device is received.
The webpage evidence obtaining request comprises an evidence obtaining object to be searched, and the evidence obtaining object comprises a text object or a picture object.
In this embodiment, a server for web page forensics is preset, which will be referred to as a forensics server hereinafter, and the forensics server is an implementation subject of this embodiment and is also a core of the whole forensics system. The forensic server may be set by the court or by other units or organizations authorized by the court. The evidence obtaining system can provide platform interfaces of application programs (APP), web pages, social platform public numbers and the like for users, and the web page evidence obtaining service provided by the evidence obtaining system can be used after the users register on any platform interface through mobile phones, tablet computers and other terminal equipment.
Because the embodiment is mainly applied to the scenes of legal litigation, in order to meet the requirements related to the subsequent litigation, the user needs to obtain the real identity information of the evidence provider, so before using the evidence collection system, the user needs to first provide identity documents for checking through real-name authentication, and reserve contact ways such as telephone numbers, mailboxes and the like for subsequent communication.
When a user wants to determine whether a text object (generally copyright protection for papers, novels, articles and the like) or a picture object (generally protection for artwork, industrial design and product appearance) owned by the user is at risk of being infringed by others, a webpage evidence obtaining request can be sent to the evidence obtaining server through the terminal device of the user. Specifically, the user may first find a page submitting a web page evidence obtaining request in a platform interface provided by the evidence obtaining system, and upload the text object or the picture object in a specified area in the page. After the user completes filling of the related information, the submitting button is clicked, and a evidence obtaining request can be sent to the evidence obtaining server, wherein the evidence obtaining request carries the identity information of the user and evidence obtaining objects (including text objects or picture objects) to be searched.
Step S102, extracting the evidence obtaining object from the webpage evidence obtaining request, and judging the type of the evidence obtaining object.
After receiving the webpage evidence obtaining request, the evidence obtaining server can extract the evidence obtaining object from the webpage evidence obtaining request and judge the type of the evidence obtaining object, namely whether the evidence obtaining object is a text object or a picture object. If the evidence obtaining object is a text object, step S103 is executed, and if the evidence obtaining object is a picture object, step S104 is executed.
Step S103, searching a first webpage evidence in a network through a preset search engine, and collecting images of webpages where the first webpage evidence is located.
The first webpage evidence is webpage text with the text similarity with the text object larger than a preset first threshold value.
For text objects such as papers, novels and articles, the content of which is generally long and cannot be directly searched, in a specific implementation of this embodiment, as shown in fig. 2, a key combination in a key paragraph may be selected from the whole text information to search, and the specific process is as follows:
and step S1031, performing word segmentation processing on the text object to obtain each word forming the text object.
The word segmentation processing refers to the process of segmenting text information into individual words, in this embodiment, the text information can be segmented according to a general dictionary, so that the words are ensured to be normal words, and if the words are not in the dictionary, the words are segmented. When words can be formed in the front and rear directions, for example, "ABC" is divided according to the size of the statistical word frequency, if the word frequency of "AB" is high, the word frequency of "AB/C" is divided, and if the word frequency of "BC" is high, the word frequency of "A/BC" is divided.
Step S1032, counting the occurrence frequency of each word forming the text object in a preset corpus.
The corpus comprises a preset number of text messages.
In this embodiment, a corpus containing various text information may be pre-constructed, and in order to ensure accuracy of the statistical result, the corpus should keep a larger corpus scale as much as possible, i.e. the number of text information is as much as possible, for example, 5 ten thousand, 10 ten thousand, 20 ten thousand or other numbers of text information may be stored in the corpus according to the actual situation.
After the text object is segmented, the occurrence frequency of the segmented words in the corpus can be counted for each word, if the occurrence frequency of a certain word in the corpus is higher, the more popular the word is used, the lower the recognition degree is, the less suitable as a search keyword is, otherwise, if the occurrence frequency of the certain word in the corpus is lower, the more rare the word is used, the higher the recognition degree is, and the more suitable as a search keyword is.
Step S1033, selecting the top CN words with the least frequency of occurrence in the corpus from the words forming the text object as candidate keywords.
CN is an integer greater than 1, and its specific value may be set according to practical situations, for example, it may be set to 20, 30, 50 or other values.
In another specific implementation of this embodiment, after the frequency statistics is performed, the word with the frequency smaller than the preset threshold may be used as the candidate keyword, where the threshold may be set according to the actual situation, for example, may be set to 10 times, 20 times, 50 times, or other values.
Step S1034, counting the total number of candidate keywords included in each text paragraph of the text object.
It should be noted that the count is only made once for the candidate keywords that repeatedly appear. For example, in a certain text paragraph, candidate keyword a appears 5 times, candidate keyword B appears 8 times, candidate keyword C appears 15 times, but each candidate keyword is counted only once, so the total number of candidate keywords included in the text paragraph is 3, which is finally counted.
Step S1035, selecting a reference text paragraph from the text paragraphs of the text object.
And the standard text paragraph is the text paragraph with the largest total number of the candidate keywords, and is taken as the object for comparing the text similarity.
Step S1036, searching the web page texts containing the preferred keywords in the network through the search engine.
The preferred keywords are candidate keywords that occur in the passage of the reference text. Each preferred keyword is constructed as a search keyword combination and searches are performed in the network using the search keyword combination. For example, if there are 4 candidate keywords in the reference text passage, they are: keyword a, keyword B, keyword C, keyword D, the finally determined search formula input into the search engine should be:
keywords A and B and C and D.
Through the search formula, each webpage text to be screened and compared can be searched in the network.
Step S1037, selecting, from the web page texts, a web page text with a text similarity with the reference text paragraph greater than the first threshold value as the first web page evidence.
Comparing each searched webpage text with the reference text paragraph one by one, if the text similarity between the two is smaller than or equal to the first threshold value, judging that the two are not matched, and not needing evidence collection, if the text similarity between the two is larger than the first threshold value, judging that the two are matched, taking the webpage text as the first webpage evidence at the moment, and collecting an image of a webpage where the second webpage evidence is located as an evidence image for use in the subsequent litigation process.
The text similarity between two texts is a ratio of the number of the same characters between the two texts to the total number of characters of one text, and the specific value of the first threshold may be set according to practical situations, for example, may be set to 90%, 95%, 98% or other values.
By adopting the mode of fig. 2, the combination of some more uncommon words used in the text information is adopted to search, so that irrelevant network information can be screened out to the greatest extent, and the workload of text comparison is greatly reduced.
Step S104, searching the second webpage evidence in the network through the search engine, and collecting images of the webpage where the second webpage evidence is located.
The second webpage evidence is a webpage picture with the image similarity with the picture object being larger than a preset second threshold value.
As shown in fig. 3, step S104 may specifically include the following procedures:
step S1041, searching each candidate web page picture in the network through the search engine.
In this embodiment, all the web page pictures that can be found may be crawled in the network by the search engine at random, and may be taken as candidate web page pictures, or all the web page pictures that can be found may be crawled only in a specified website (such as some well-known e-commerce websites), and may be taken as candidate web page pictures.
Step S1042, respectively calculating the image similarity between each candidate web page picture and the picture object.
Considering that multiple image comparison may be involved in the present embodiment, the comparison method commonly used in the prior art generally extracts feature vectors in images through an LBP algorithm, a SIFT algorithm and other similar algorithms, and uses the similarity between the feature vectors as the similarity between the images, and since the extraction process of the feature vectors involves a large amount of computation, a large amount of resources and time are consumed. Since the more similar the image, the more similar the pixel value distribution thereof, the image similarity calculation is preferably performed by statistics of the pixel value distribution in the present embodiment.
First, the distribution ratio of the pixels with the respective color component values in the picture object is calculated, and the distribution ratio of the pixels with the respective color component values in the respective candidate web page pictures is calculated.
For example, the distribution ratio of pixels of each color component value in the picture object may be calculated according to the following formula:
wherein PN1 is the total number of pixel points of the picture object, stRPixNum pv For the total number of pixel points with the red component value of pv in the picture object, stBPixNum pv For blue in the picture objectTotal number of pixel points with pv color component value, stGPixNum pv The total number of pixel points with the green component value pv in the picture object is Strratio pv For the distribution ratio of the pixels with the red component value of pv in the picture object, stBRatio pv For the distribution ratio of the pixels with the blue component value of pv in the picture object, stGRatio pv For the distribution ratio of the pixels with the green component value pv in the picture object, pv is more than or equal to 0 and PVMax, where PVMax is the maximum value of the pixel values, and typically, PVMax is 255.
Similarly, the distribution ratio of the pixels of each color component value in the nth candidate web page picture may be calculated according to the following formula:
wherein PN2 is the total number of pixel points of the nth candidate webpage picture, cdRPixNum pv For the total number of pixel points with the red component value of pv in the nth candidate webpage picture, cdBPixNum pv For the total number of the pixel points with the blue component value of pv in the nth candidate webpage picture, cdGPixNum pv For the total number of pixel points with the green component value of pv in the nth candidate webpage picture, cdRRatio pv For the distribution ratio of the pixel points with the red component value of pv in the nth candidate webpage picture, cdBRatio pv For the distribution ratio of the pixel points with the blue component value of pv in the nth candidate webpage picture, cdGRatio pv And the distribution ratio of the pixels with the green component value pv in the nth candidate web page picture is obtained.
Then, calculating the image similarity between the nth candidate web page picture and the picture object according to the following formula:
wherein, diffRatio is as follows pv =(StRRatio pv -CdRRatio pv ) 2 +(StBRatio pv -CdBRatio pv ) 2 +(StGRatio pv -CdGRatio pv ) 2 SimDeg is the image similarity between the nth candidate web page picture and the picture object, and it can be seen that the more similar the pixel value distribution of the two pictures is, the higher the image similarity between the two pictures is.
In this way, pixel value distribution statistics are used to replace feature vector calculation to realize image comparison, so that the calculation amount is greatly reduced.
Step S1043, selecting, from each candidate web page picture, a web page picture with an image similarity with the picture object greater than the second threshold as the second web page evidence.
The specific value of the second threshold may be set according to practical situations, for example, may be set to 90%, 95%, 98% or other values. After the second webpage evidence is selected, the evidence obtaining server collects an image of the webpage where the second webpage evidence is located as an evidence image for use in the subsequent litigation process.
Further, after the evidence image is acquired, the evidence server can add a time stamp to the evidence image through the time service system, so that the evidence is present at the current time point. The timestamp (timestamp) refers to the total number of seconds from the time of greenwich time 1970, 01, 00 minutes, 00 seconds (Beijing time 1970, 01, 08, 00 minutes, 00 seconds) to the present time, and is a complete and verifiable data, usually a character sequence, which can represent that a piece of data exists before a specific time, and uniquely identifies the time of a certain moment.
First, a evidence obtaining server carries out hash operation on the evidence image to obtain a hash value corresponding to the evidence image.
Hash operations are the transformation of an arbitrary length input into a fixed length output, which is the hash value. This transformation is a compressed mapping, i.e. the length of the output is typically much smaller than the length of the input, different inputs may be hashed into the same output, and it is not possible to uniquely determine the input value from the output value. Simply stated, a process of compressing a message of any length to a message digest of a fixed length. The hash operation used in the present embodiment may include, but is not limited to, specific algorithms such as MD4, MD5, SHA1, etc.
And then, the evidence obtaining server sends the hash value to the time service system.
In this embodiment, a joint trust timestamp service center is preferably adopted to provide timestamp service, and the joint trust timestamp service center is a national time service center of national academy of sciences in China and a national third party trusted timestamp authentication service responsible for construction by Beijing joint trust technology service limited company. The national time service center is responsible for time service and time keeping monitoring. The time accuracy in the timestamp certificate is guaranteed and the timestamp certificate is not tampered with due to the timekeeping monitoring function.
And finally, the evidence obtaining server receives the time stamp certificate of the evidence image fed back by the time service system, and adds the time stamp certificate into the evidence image to obtain a stamped evidence image.
And the time stamp certificate of the evidence image is data obtained after the time service system digitally signs the hash value and the system time. After the time service system receives the hash value of the evidence, adding a time stamp when the hash value is received, and then digitally signing the whole, so that a time stamp certificate of the evidence image is obtained, and the finally obtained time stamp certificate is sent to the evidence obtaining server.
Further, in order to ensure the security of the evidence, the evidence obtaining server may further upload the evidence image after stamping to a specified blockchain system, where the blockchain system should be a forensic-authenticated blockchain system with legal effectiveness, and the blockchain system may be a public chain, a alliance chain or a private chain, and the blockchain system generally includes a plurality of nodes, and in this embodiment, the evidence obtaining server is one of the writing nodes.
The evidence obtaining server uploads the evidence image after the stamping to a blockchain system, and each node in the blockchain system obtains the writing authority of the evidence through a set consensus mechanism, wherein the set consensus mechanism comprises specific mechanisms including, but not limited to, POW, POS, DPOS, PBFT, a sequential rotation mechanism or a random selection mechanism. The node obtaining the writing authority sends the evidence to each node in the block chain system in the form of a block so that each node verifies the block, and if the verification is passed, the block is stored on the block chain; if the verification fails, the block is deleted.
If the block is not confirmed in the blockchain system, a failure result is fed back to the evidence obtaining server. In contrast, if the block is confirmed and stored, a successful result is fed back to the evidence obtaining server, so that the state of the information stored in the blockchain system is ensured to be clear, and the problem of data loss does not occur. Due to the distributed storage characteristic of the block chain, all nodes in the block chain system record evidence information together, cannot tamper with the evidence information and endorse the evidence information together.
In the litigation process, if a user needs to show related evidence to a court, an application can be provided to the court, and after court verification approval, the stamped evidence image can be obtained from the blockchain system through a terminal device appointed by the court and is shown in the court.
In summary, in the embodiment of the present invention, a server for web page evidence collection (evidence collection server, i.e. the implementation subject of the present embodiment) is preset, and a user may send a web page evidence collection request to the evidence collection server through a terminal device, so that the evidence collection server automatically searches for web page evidence containing content that may infringe the rights of the user in a network. For example, if the object that the user wants to protect is a text object, the text object may be carried in the webpage evidence obtaining request, the evidence obtaining server may automatically search the webpage text that is relatively similar to the text object in the network and collect the image of the webpage where the evidence obtaining server locates as the webpage evidence, and if the object that the user wants to protect is a picture object, the picture object may be carried in the webpage evidence obtaining request, the evidence obtaining server may automatically search the webpage picture that is relatively similar to the picture object in the network and collect the image of the webpage where the evidence obtaining server locates as the webpage evidence. By the embodiment of the invention, the user can be helped to actively obtain evidence in massive network information, infringement behavior can be found timely, and legal benefits of the user are comprehensively ensured.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
Corresponding to the search-based webpage evidence obtaining method described in the above embodiments, fig. 4 shows a block diagram of an embodiment of a webpage evidence obtaining device according to an embodiment of the present invention.
In this embodiment, a web page evidence obtaining apparatus may include:
the evidence obtaining request receiving module 401 is configured to receive a webpage evidence obtaining request sent by a terminal device, where the webpage evidence obtaining request includes an evidence obtaining object to be searched, and the evidence obtaining object includes a text object or a picture object;
a forensic object extraction module 402, configured to extract the forensic object from the webpage forensic request, and determine a type to which the forensic object belongs;
the first search module 403 is configured to search, if the evidence obtaining object is a text object, for a first web page evidence in a network through a preset search engine, and collect an image of a web page where the first web page evidence is located, where the first web page evidence is a web page text whose text similarity with the text object is greater than a preset first threshold;
And the second search module 404 is configured to search, if the evidence obtaining object is a picture object, for a second web page evidence in the network by using the search engine, and collect an image of a web page where the second web page evidence is located, where the second web page evidence is a web page picture with an image similarity with the picture object greater than a preset second threshold value.
Further, the first search module may include:
the word segmentation processing unit is used for carrying out word segmentation processing on the text object to obtain each word forming the text object;
the frequency statistics unit is used for respectively counting the frequency of each word forming the text object in a preset corpus, wherein the corpus comprises a preset number of text information;
a candidate keyword selection unit, configured to select, from among the words that constitute the text object, the first CN words that occur in the corpus with the least frequency as candidate keywords;
a keyword number statistics unit, configured to respectively count total numbers of the candidate keywords included in each text paragraph of the text object;
a reference text paragraph selecting unit, configured to select a reference text paragraph from each text paragraph of the text object, where the reference text paragraph is a text paragraph that contains the candidate keywords with the largest total number;
A text search unit for searching, by the search engine, each web page text in a network that contains a preferred keyword, the preferred keyword being a candidate keyword that appears in the reference text paragraph;
and the text selection unit is used for selecting the webpage text with the text similarity larger than the first threshold value from the webpage texts as the first webpage evidence.
Further, the second search module may include:
the picture searching unit is used for searching each candidate webpage picture in the network through the search engine;
the image similarity calculation unit is used for calculating the image similarity between each candidate webpage picture and the picture object respectively;
and the picture selection unit is used for selecting the webpage picture with the image similarity larger than the second threshold value from the candidate webpage pictures as the second webpage evidence.
Further, the image similarity calculation unit may include:
a first ratio calculating subunit, configured to calculate a distribution ratio of pixels of each color component value in the picture object;
the second ratio calculation subunit is used for calculating the distribution ratio of the pixel points with the color component values in each candidate webpage picture respectively;
A similarity calculating subunit, configured to calculate an image similarity between the nth candidate web page picture and the picture object according to the following formula:
wherein, diffRatio is as follows pv =(StRRatio pv -CdRRatio pv ) 2 +(StBRatio pv -CdBRatio pv ) 2 +(StGRatio pv -CdGRatio pv ) 2 ,CdRRatio pv For the distribution ratio of the pixel points with the red component value of pv in the nth candidate webpage picture, stRRatio pv For the distribution ratio of pixels with the red component value of pv in the picture object, cdBRatio pv As the distribution ratio of the pixel points with the blue component value of pv in the nth candidate webpage picture, stBRatio pv For the distribution ratio of the pixels with the blue component value of pv in the picture object, cdGRatio pv For the distribution ratio of the pixels with the green component value of pv in the nth candidate web page picture, stGRatio pv And the distribution ratio of the pixel points with the green component value of pv in the picture object is equal to or more than 0 and equal to or less than PVMax, PVMax is the maximum value of the pixel value, and SimDeg is the image similarity between the nth candidate webpage picture and the picture object.
Further, the first ratio calculating subunit is specifically configured to calculate a distribution ratio of pixels of each color component value in the picture object according to the following formula:
wherein PN1 is the followingTotal number of pixels of picture object, stRPixNum pv For the total number of pixel points with the red component value of pv in the picture object, stBPixNum pv For the total number of pixel points with the blue component value of pv in the picture object, stGPixNum pv And taking the total number of pixel points with the value pv for the green component in the picture object.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described apparatus, modules and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Fig. 5 shows a schematic block diagram of a server according to an embodiment of the present invention, and for convenience of explanation, only a portion related to the embodiment of the present invention is shown.
In this embodiment, the server 5 may be a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud server. The server 5 may include: a processor 50, a memory 51, and computer readable instructions 52 stored in the memory 51 and executable on the processor 50, such as computer readable instructions for performing the search-based web page forensics method described above. The processor 50, when executing the computer readable instructions 52, implements the steps described above in various search-based web page forensics method embodiments, such as steps S101 through S104 shown in fig. 1. Alternatively, the processor 50, when executing the computer readable instructions 52, performs the functions of the modules/units of the apparatus embodiments described above, such as the functions of modules 401 through 404 shown in fig. 4.
Illustratively, the computer readable instructions 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50 to accomplish the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing a specific function describing the execution of the computer readable instructions 52 in the server 5.
The processor 50 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), field programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 51 may be an internal storage unit of the server 5, for example, a hard disk or a memory of the server 5. The memory 51 may be an external storage device of the server 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the server 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the server 5. The memory 51 is used to store the computer readable instructions as well as other instructions and data required by the server 5. The memory 51 may also be used to temporarily store data that has been output or is to be output.
The functional units in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, comprising a number of computer readable instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing computer readable instructions.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A search-based web page evidence obtaining method, comprising:
receiving a webpage evidence obtaining request sent by terminal equipment, wherein the webpage evidence obtaining request comprises an evidence obtaining object to be searched, and the evidence obtaining object comprises a text object or a picture object;
extracting the evidence obtaining object from the webpage evidence obtaining request, and judging the type of the evidence obtaining object;
if the evidence obtaining object is a text object, searching a first webpage evidence in a network through a preset search engine, and collecting images of webpages where the first webpage evidence is located, wherein the first webpage evidence is a webpage text with a text similarity greater than a preset first threshold value with a reference text paragraph of the text object, the reference text paragraph is a text paragraph with the largest total number of candidate keywords, and the candidate keywords are a plurality of words with the smallest occurrence frequency in a preset corpus;
If the evidence obtaining object is a picture object, searching a second webpage evidence in a network through the search engine, and collecting an image of a webpage where the second webpage evidence is located, wherein the second webpage evidence is a webpage picture with the image similarity with the picture object being larger than a preset second threshold value.
2. The method for collecting evidence from a web page according to claim 1, wherein searching the network for the first web page evidence by a preset search engine comprises:
word segmentation processing is carried out on the text object to obtain each word forming the text object;
respectively counting the occurrence frequency of each word forming the text object in the corpus, wherein the corpus comprises a preset number of text information;
selecting the top CN words with the least frequency of occurrence in the corpus from the words forming the text object as the candidate keywords;
respectively counting the total number of the candidate keywords contained in each text paragraph of the text object;
selecting the reference text passage from each text passage of the text object;
searching, by the search engine, for each piece of web page text in a network that contains a preferred keyword, the preferred keyword being a candidate keyword that appears in the passage of reference text;
And selecting the webpage text with the text similarity with the reference text paragraph larger than the first threshold value from the webpage texts as the first webpage evidence.
3. The method of claim 1, wherein searching for second web page evidence in the network by the search engine comprises:
searching each candidate webpage picture in the network through the search engine;
respectively calculating the image similarity between each candidate webpage picture and the picture object;
and selecting the webpage picture with the image similarity larger than the second threshold value from the candidate webpage pictures as the second webpage evidence.
4. The web page evidence obtaining method according to claim 3, wherein the calculating the image similarity between the picture object and each candidate web page picture includes:
calculating the distribution ratio of the pixel points with the color component values in the picture object, and respectively calculating the distribution ratio of the pixel points with the color component values in each candidate webpage picture;
calculating the image similarity between the nth candidate webpage picture and the picture object according to the following formula:
Wherein, diffRatio is as follows pv =(StRRatio pv -CdRRatio pv ) 2 +(StBRatio pv -CdBRatio pv ) 2 +(StGRatio pv -CdGRatio pv ) 2 ,CdRRatio pv For the distribution ratio of the pixel points with the red component value of pv in the nth candidate webpage picture, stRRatio pv For the distribution ratio of pixels with the red component value of pv in the picture object, cdBRatio pv As the distribution ratio of the pixel points with the blue component value of pv in the nth candidate webpage picture, stBRatio pv For the distribution ratio of the pixels with the blue component value of pv in the picture object, cdGRatio pv For the distribution ratio of the pixels with the green component value of pv in the nth candidate web page picture, stGRatio pv And the distribution ratio of the pixel points with the green component value of pv in the picture object is equal to or more than 0 and equal to or less than PVMax, PVMax is the maximum value of the pixel value, and SimDeg is the image similarity between the nth candidate webpage picture and the picture object.
5. The method according to claim 4, wherein calculating the distribution ratio of pixels of each color component value in the picture object comprises:
calculating the distribution ratio of pixel points of each color component value in the picture object according to the following formula:
wherein PN1 is the total number of pixel points of the picture object, stRPixNum pv For the total number of pixel points with the red component value of pv in the picture object, stBPixNum pv For the total number of pixel points with the blue component value of pv in the picture object, stGPixNum pv And taking the total number of pixel points with the value pv for the green component in the picture object.
6. A web page evidence obtaining apparatus, comprising:
the system comprises a evidence obtaining request receiving module, a terminal device and a search module, wherein the evidence obtaining request receiving module is used for receiving a webpage evidence obtaining request sent by the terminal device, the webpage evidence obtaining request comprises an evidence obtaining object to be searched, and the evidence obtaining object comprises a text object or a picture object;
the evidence obtaining object extracting module is used for extracting the evidence obtaining object from the webpage evidence obtaining request and judging the type of the evidence obtaining object;
the first search module is used for searching first webpage evidence in a network through a preset search engine if the evidence obtaining object is a text object, and collecting images of webpages where the first webpage evidence is located, wherein the first webpage evidence is a webpage text with a text similarity greater than a preset first threshold value with a reference text paragraph of the text object, the reference text paragraph is a text paragraph with the largest total number of candidate keywords, and the candidate keywords are a plurality of words with the smallest occurrence frequency in a preset corpus;
And the second search module is used for searching second webpage evidence in a network through the search engine if the evidence obtaining object is a picture object, and collecting images of webpages where the second webpage evidence is located, wherein the second webpage evidence is a webpage picture with the image similarity with the picture object being larger than a preset second threshold value.
7. The web page forensics device according to claim 6 wherein the first search module comprises:
the word segmentation processing unit is used for carrying out word segmentation processing on the text object to obtain each word forming the text object;
the frequency statistics unit is used for respectively counting the frequency of each word forming the text object in the corpus, wherein the corpus comprises a preset number of text information;
a candidate keyword selection unit, configured to select, from among the words that constitute the text object, the first CN words that occur in the corpus with the least frequency as the candidate keywords;
a keyword number statistics unit, configured to respectively count total numbers of the candidate keywords included in each text paragraph of the text object;
a reference text paragraph selecting unit, configured to select the reference text paragraph from each text paragraph of the text object;
A text search unit for searching, by the search engine, each web page text in a network that contains a preferred keyword, the preferred keyword being a candidate keyword that appears in the reference text paragraph;
and the text selection unit is used for selecting the webpage text with the text similarity larger than the first threshold value from the webpage texts as the first webpage evidence.
8. The web page forensics device according to claim 6 wherein the second search module comprises:
the picture searching unit is used for searching each candidate webpage picture in the network through the search engine;
the image similarity calculation unit is used for calculating the image similarity between each candidate webpage picture and the picture object respectively;
and the picture selection unit is used for selecting the webpage picture with the image similarity larger than the second threshold value from the candidate webpage pictures as the second webpage evidence.
9. A computer readable storage medium storing computer readable instructions which when executed by a processor implement the steps of the web page forensic method according to any one of claims 1 to 5.
10. A server comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein execution of the computer readable instructions by the processor implements the steps of the web page forensic method according to any one of claims 1 to 5.
CN201910652647.5A 2019-07-19 2019-07-19 Webpage evidence obtaining method and device based on search, readable storage medium and server Active CN110457434B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910652647.5A CN110457434B (en) 2019-07-19 2019-07-19 Webpage evidence obtaining method and device based on search, readable storage medium and server
PCT/CN2019/118134 WO2021012521A1 (en) 2019-07-19 2019-11-13 Search-based webpage forensics method and device, readable storage medium and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910652647.5A CN110457434B (en) 2019-07-19 2019-07-19 Webpage evidence obtaining method and device based on search, readable storage medium and server

Publications (2)

Publication Number Publication Date
CN110457434A CN110457434A (en) 2019-11-15
CN110457434B true CN110457434B (en) 2023-10-27

Family

ID=68481517

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910652647.5A Active CN110457434B (en) 2019-07-19 2019-07-19 Webpage evidence obtaining method and device based on search, readable storage medium and server

Country Status (2)

Country Link
CN (1) CN110457434B (en)
WO (1) WO2021012521A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242790B (en) * 2020-01-02 2020-11-17 平安科技(深圳)有限公司 Risk identification method, electronic device and storage medium
CN112182329B (en) * 2020-09-14 2023-04-18 浙江数秦科技有限公司 Network picture infringement monitoring and automatic evidence obtaining method
CN113032735B (en) * 2021-05-21 2021-08-17 浙江数秦科技有限公司 Digital asset evidence and infringement monitoring system and method based on block chain technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844717A (en) * 2017-02-08 2017-06-13 北京小米移动软件有限公司 Webpage search display methods and device
CN107798070A (en) * 2017-09-26 2018-03-13 平安普惠企业管理有限公司 A kind of web data acquisition methods and terminal device
CN107832384A (en) * 2017-10-28 2018-03-23 北京安妮全版权科技发展有限公司 Infringement detection method, device, storage medium and electronic equipment
CN109857893A (en) * 2019-01-16 2019-06-07 平安科技(深圳)有限公司 Picture retrieval method, device, computer equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4898934B2 (en) * 2010-03-29 2012-03-21 株式会社Ubic Forensic system, forensic method, and forensic program
US9197613B2 (en) * 2011-12-20 2015-11-24 Industrial Technology Research Institute Document processing method and system
CN103324650A (en) * 2012-10-23 2013-09-25 深圳市宜搜科技发展有限公司 Image retrieval method and system
CN103150904A (en) * 2013-02-05 2013-06-12 中山大学 Bayonet vehicle image identification method based on image features
CN105022752B (en) * 2014-04-29 2019-04-05 中国电信股份有限公司 Image search method and device
CN106650799B (en) * 2016-12-08 2019-05-31 重庆邮电大学 A kind of electronic evidence classification extracting method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844717A (en) * 2017-02-08 2017-06-13 北京小米移动软件有限公司 Webpage search display methods and device
CN107798070A (en) * 2017-09-26 2018-03-13 平安普惠企业管理有限公司 A kind of web data acquisition methods and terminal device
CN107832384A (en) * 2017-10-28 2018-03-23 北京安妮全版权科技发展有限公司 Infringement detection method, device, storage medium and electronic equipment
CN109857893A (en) * 2019-01-16 2019-06-07 平安科技(深圳)有限公司 Picture retrieval method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110457434A (en) 2019-11-15
WO2021012521A1 (en) 2021-01-28

Similar Documents

Publication Publication Date Title
CN107463605B (en) Method and device for identifying low-quality news resource, computer equipment and readable medium
Hakak et al. Approaches for preserving content integrity of sensitive online Arabic content: A survey and research challenges
CN110457434B (en) Webpage evidence obtaining method and device based on search, readable storage medium and server
CA3138730C (en) Public-opinion analysis method and system for providing early warning of enterprise risks
WO2021143497A1 (en) Infringement evidence storage method, apparatus, and device based on evidence storage blockchain
US9785989B2 (en) Determining a characteristic group
CN110245469B (en) Webpage watermark generation method, watermark analysis method, device and storage medium
KR101627398B1 (en) System and method for protecting personal contents right using context-based search engine
CN106874253A (en) Recognize the method and device of sensitive information
US10339373B1 (en) Optical character recognition utilizing hashed templates
CN109977684B (en) Data transmission method and device and terminal equipment
CN108881230B (en) Secure transmission method and device for government affair big data
CN110019640B (en) Secret-related file checking method and device
JP6695987B2 (en) Advertisement generation method, computer-readable storage medium and system
JP6169277B2 (en) Digital content monitoring system for ensuring consistency of digital content
CN110472128B (en) Webpage evidence obtaining method and device based on image recognition, storage medium and server
TW201421267A (en) Searching system and method
CN111027065B (en) Leucavirus identification method and device, electronic equipment and storage medium
CN105354506B (en) The method and apparatus of hidden file
CN112115423A (en) Electronic notarization information processing method, device, system, equipment and storage medium
Chen et al. Fraud analysis and detection for real-time messaging communications on social networks
US11681966B2 (en) Systems and methods for enhanced risk identification based on textual analysis
US20210064662A1 (en) Data collection system for effectively processing big data
CN105956482A (en) Method and system for data leakage protection
CN111736939A (en) Page self-adaptive adjusting method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant