CN108764352A - Duplicate pages content detection algorithm and device - Google Patents

Duplicate pages content detection algorithm and device Download PDF

Info

Publication number
CN108764352A
CN108764352A CN201810545595.7A CN201810545595A CN108764352A CN 108764352 A CN108764352 A CN 108764352A CN 201810545595 A CN201810545595 A CN 201810545595A CN 108764352 A CN108764352 A CN 108764352A
Authority
CN
China
Prior art keywords
segment
image
duplicate pages
similarity
described image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810545595.7A
Other languages
Chinese (zh)
Other versions
CN108764352B (en
Inventor
柏馨
张婷
� 崔
崔一
项金鑫
尹飞
刘盼盼
薛大伟
魏晨辉
邢潘红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810545595.7A priority Critical patent/CN108764352B/en
Publication of CN108764352A publication Critical patent/CN108764352A/en
Application granted granted Critical
Publication of CN108764352B publication Critical patent/CN108764352B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The present invention proposes a kind of duplicate pages content detection algorithm and device, wherein method includes:By carrying out interface sectional drawing to the page to be detected, the image of the page is obtained;According to the half-tone information of described image, cut zone is identified from described image;Described image is split according to the cut zone, obtains multiple segments;It according to the characteristics of image of the multiple segment, is clustered, and similarity degree is determined to belonging to the same segment to cluster;According to the similarity degree belonged between the same segment to cluster, it is determined whether there are duplicate pages contents.Sectional drawing obtained image in interface is split by this method by image technique, to which each segment be compared, the detection that content of pages repeats can be completed, automation verification in the prior art is solved to need additionally to prepare a large amount of objects of reference, the higher technical problem of testing cost, improves the efficiency of duplicate pages content detection.

Description

Duplicate pages content detection algorithm and device
Technical field
The present invention relates to technical field of mobile terminals more particularly to a kind of duplicate pages content detection algorithms and device.
Background technology
The front end pages such as application program (Application, APP), webpage will appear the problem of content of pages repeats so that Duplicate pages content test has become important test session.In actual use, the repetition of content of pages, not only seriously The experience for affecting user will also result in the waste of Internet resources.
In the prior art, the problem of being repeated for content of pages, the method for still using artificial detection examine content of pages It surveys, detection accuracy depends on artificial experience.Although in the prior art, having also appeared automation calibration technology, this technology It needs additionally to prepare object of reference, content of pages and reference substance is compared, this mode is vulnerable to object of reference limitation, results in a finding that weight Multiple content of pages ability is weak, needs additionally to prepare a large amount of objects of reference if duplicate pages detection accuracy need to be improved, testing cost compared with It is high.In short, duplicate pages content detection in the prior art is less efficient.
Invention content
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, the present invention proposes a kind of duplicate pages content detection algorithm, duplicate pages content is efficiently completed to realize Detection solves artificial detection in the prior art and artificial experience, and automation verification is depended on to need additionally to prepare a large amount of references Object, the higher technical problem of testing cost, improves the efficiency of duplicate pages content detection.
The present invention proposes a kind of duplicate pages content detection device.
The present invention proposes a kind of computer equipment.
The present invention proposes a kind of computer readable storage medium.
First aspect present invention embodiment proposes a kind of duplicate pages content detection algorithm, including:
Interface sectional drawing is carried out to the page to be detected, obtains the image of the page;
According to the half-tone information of described image, cut zone is identified from described image;
Described image is split according to the cut zone, obtains multiple segments;
It according to the characteristics of image of the multiple segment, is clustered, and similar journey is determined to belonging to the same segment to cluster Degree;
According to the similarity degree belonged between the same segment to cluster, it is determined whether there are duplicate pages contents.
The duplicate pages content detection algorithm of the embodiment of the present invention is obtained by carrying out interface sectional drawing to the page to be detected To the image of the page;According to the half-tone information of described image, cut zone is identified from described image;According to described point It cuts region to be split described image, obtains multiple segments;According to the characteristics of image of the multiple segment, clustered, and Similarity degree is determined to belonging to the same segment to cluster;According to the similarity degree belonged between the same segment to cluster, determination is It is no that there are duplicate pages contents.Image obtained by the sectional drawing of interface is split by this method by image technique, to each figure Block is compared, and prepares object of reference without additional, you can the detection for completing duplicate pages content solves artificial in the prior art Detection depends on artificial experience, and automation verification to need additionally to prepare a large amount of objects of reference, and the higher technology of testing cost is asked Topic, improves the efficiency of duplicate pages content detection.
Second aspect of the present invention embodiment proposes a kind of duplicate pages content detection device, including:
Acquisition module obtains the image of the page for carrying out interface sectional drawing to the page to be detected;
Identification module identifies cut zone for the half-tone information according to described image from described image;
Segmentation module obtains multiple segments for being split to described image according to the cut zone;
Cluster module is clustered, and for the characteristics of image according to the multiple segment to belonging to the same figure to cluster Block determines similarity degree;
Detection module, for according to the similarity degree belonged between the same segment to cluster, it is determined whether there are duplicate pages Face content.
The duplicate pages content detection device of the embodiment of the present invention is obtained by carrying out interface sectional drawing to the page to be detected To the image of the page;According to the half-tone information of described image, cut zone is identified from described image;According to described point It cuts region to be split described image, obtains multiple segments;According to the characteristics of image of the multiple segment, clustered, and Similarity degree is determined to belonging to the same segment to cluster;According to the similarity degree belonged between the same segment to cluster, determination is It is no that there are duplicate pages contents.Image obtained by the sectional drawing of interface is split by this method by image technique, to each figure Block is compared, and prepares object of reference without additional, you can completes the detection of duplicate pages content, solves artificial inspection in the prior art It surveys and needs additionally to prepare a large amount of objects of reference dependent on artificial experience and automation verification, the higher technical problem of testing cost, Improve the efficiency of duplicate pages content detection.
Third aspect present invention embodiment proposes a kind of computer equipment, including:Processor;For storing the processing The memory of device executable instruction;Wherein, the processor is transported by reading the executable program code stored in memory Row program corresponding with executable program code, for executing the duplicate pages content detection side described in first aspect embodiment Method.
Fourth aspect present invention embodiment proposes a kind of computer readable storage medium, the finger in the storage medium When order is executed by processor, for executing the duplicate pages content detection algorithm described in first aspect embodiment.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obviously, or practice through the invention is recognized.
Description of the drawings
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, wherein:
A kind of flow diagram for duplicate pages content detection algorithm that Fig. 1 is provided by the embodiment of the present invention;
A kind of flow diagram for duplicate pages content detection algorithm that Fig. 2 is provided by the embodiment of the present invention two;
The flow diagram for another duplicate pages content detection algorithm that Fig. 3 is provided by the embodiment of the present invention;
Fig. 4 is the significant problem exemplary plot found in mobile product using duplicate pages content detection algorithm of the present invention;
A kind of structural schematic diagram for duplicate pages content detection device that Fig. 5 is provided by the embodiment of the present invention;And
Fig. 6 shows the block diagram of the exemplary computer device suitable for being used for realizing the application embodiment.
Specific implementation mode
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Currently, application of the mobile terminal device in daily life is increasingly extensive, user during use, The phenomenon that finding the front end pages such as APP, webpage there are content of pages repetitions, lead to poor user experience so that in duplicate pages Hold test and has become important test session.In the prior art, for duplicate pages content detection, using the figure based on object of reference As comparison discovery method, by manually carrying out accurate sectional drawing according to the pre-set automation track of each step, it is desirable that each Step is required for and step strict conformance in automated execution with reference to sectional drawing.But actually detected discovery in the process, traditional is artificial There are apparent limitations for detection technique, and cannot effectively find duplicate pages content.
There are limitation and testing result are undesirable for middle duplicate pages content detection technique for the above-mentioned prior art Problem.In the embodiment of the present invention, the detection that content of pages repeats is completed by image technique and does not need volume in detection process External reference object is split processing to the page first and obtains multiple segments, further according to multiple segments after obtaining mobile phone sectional drawing Characteristics of image is clustered, and determines similarity degree to belonging to the same segment to cluster, finally determines whether there is duplicate pages Content.For the ease of being better understood from, the several essential terms occurred in the present invention are introduced first:
First similarity:There are when duplicate pages content between determining two segments, the threshold value of picture similarity degree.
Second similarity:There are when duplicate pages content between determining two segments, the threshold value of word similarity degree.
Below with reference to the accompanying drawings the method and apparatus for describing the embodiment of the present invention.
A kind of flow diagram for duplicate pages content detection algorithm that Fig. 1 is provided by the embodiment of the present invention.
As shown in Figure 1, the duplicate pages content detection algorithm includes the following steps:
Step 101, interface sectional drawing is carried out to the page to be detected, obtains the image of the page.
Specifically, it before being detected content of pages, first has to obtain the image of the page to be detected, the present embodiment In interface sectional drawing carried out to the page to be detected by mobile device, the mobile device of different operating system uses different operations Sectional drawing is carried out, the image of the page to be detected is finally obtained.
Step 102, according to the half-tone information of described image, cut zone is identified from described image.
It should be noted that gray scale refers to indicating object, i.e., the color on the basis of black, different saturation degrees using black tone Black show image.Each gray scale object has the brightness value from 0% (white) to grayscale bar 100% (black).Figure The half-tone information of picture refers to the gray value of each pixel in obtained image after figure is handled by gray processing.Wherein, pixel is figure The minimum unit of picture, an image are made of many pixels, and each pixel array arrangement.
Further instruction, since the color of each pixel in coloured image has tri- components of R, G, B to determine, and it is each Component has 255 values desirable, and such a pixel can have the variation range of the color of more than 1,600 ten thousand (255*255*255). And gray level image is a kind of identical special coloured image of tri- components of R, G, B, the variation range of one pixel is 255 Kind, so the image of various formats is first generally transformed into gray level image in Digital Image Processing, to reduce subsequent image Calculation amount.The description of gray level image still reflects the coloration and bright of the entirety and part of entire image as coloured image Spend distribution and the feature of grade.
Image by the page to be detected in this present embodiment is colored, therefore after the image for obtaining the page to be detected, Gray processing processing first is carried out to image, obtains the half-tone information of image, then identify cut zone from image.Wherein, figure Gray processing processing be the process that coloured image is transformed into gray level image.
Optionally, identify that the specific method of cut zone is from the image of the page to be detected:Along each pixel-matrix The row or column of row scans for, obtain the identical at least one-row pixels point of gray scale or an at least row pixel, and will search At least one-row pixels point or at least a row pixel is merged as cut zone, then to adjacent cut zone.
Step 103, described image is split according to the cut zone, obtains multiple segments.
Specifically, image segmentation is that divide the image into several specific, with unique properties and propose mesh interested Target technology and process.Current main image partition method has Threshold segmentation, region segmentation, edge segmentation etc..The present embodiment In using the image partition method based on edge detection, basic ideas are the first edge pixels in determining image, then again These pixels are linked together and constitute required zone boundary.It is asked by detecting the edge comprising different zones to solve segmentation Topic, that is, detecting gray level or structure has the place of mutation, shows what the termination in region and another region started Place, this discontinuity are known as edge.Different gradation of images is different, and boundary generally has apparent edge, utilizes this spy Sign can divide image.
As a kind of mode in the cards, image to be detected can be split along the edge of the cut zone, Obtain multiple segments.Specifically, the gray value of edge pixel is discontinuous in image, and this discontinuity can pass through derivation It counts to detect.
As the mode of alternatively possible realization, it can also determine that cut-off rule is treated along cut-off rule inside cut zone Detection image is split, and obtains multiple segments.
It should be noted that being split to image to be detected, after obtaining multiple segments, it is also necessary to determine that each segment exists Area accounting in described image simultaneously deletes the segment that area accounting is less than threshold value accounting.Because of the too low segment pair of area accounting It is influenced less when segment is clustered, therefore to delete the segment that area accounting is less than threshold value accounting.
Step 104, it according to the characteristics of image of the multiple segment, is clustered, and true to belonging to the same segment to cluster Determine similarity degree.
It should be noted that be split to image to be detected, it is first for each segment after obtaining multiple segments Color space conversion is first carried out, is converted from rgb space to HSV space, color then is carried out to the transformed segment of color space Feature extraction, finally using the color characteristic extracted and segment area accounting in the picture as characteristics of image.According to multiple The characteristics of image of segment carries out Clustering to segment, and same group of segment has similar characteristics of image.
Specifically, it can be expressed there are many color space model for coloured image, in image procossing, frequently with RGB Model and HSV models.RGB models are the three primary colors based on human vision --- red (R), green (G), blue (B) theoretical color established Color space thinks to be properly mixed with red (R), green (G), blue (B) 3 kinds of coloured light, can cause all any on electromagnetic spectrum The perception of color.HSV models are the color spaces that the vision perception characteristic based on people is established, and wherein coloration (H) indicates different Color, such as red, green, blue;Saturation degree (S) indicates the depth of color, such as dark blue, light blue;Brightness (V) indicates the light and shade journey of color Degree, such as very bright (brilliant white), very dark (dark).Being converted from rgb space to HSV space is realized by conversion formula, for turning Changing formula, there are many expression-forms, and principle is the same, and the present embodiment is described below conversion formula:
The value range of the R in formula above, G, B are [0,255];The value range of H is [0,360];The value range of S It is [0,1];The value range of V is [0,255].In actual image procossing, usually H, S, V value ranges are normalized to [0,1]。
It should be noted that after carrying out color space conversion to segment, it is special that color is carried out to transformed segment first Sign extraction.It is directed to the color space of HSV, generally use color histogram carries out feature extraction, and matching process includes:Histogram Intersection method, Furthest Neighbor, centre-to-centre spacing method, reference color table method, cumulative color histogram method.By by the color space conversion of RGB To the color space of HSV, coloration, saturation degree, brightness are separated, convenient for more accurately identifying picture phase according to color characteristic Like degree.
Further instruction carries out Clustering, and to belonging to according to the characteristics of image of multiple segments of extraction to segment The same segment to cluster determines similarity degree.Specifically, to belonging to same two segments to cluster, it is first determined between two segments Picture similarity degree.If the picture similarity between two segments is more than the first similarity, then carries out word respectively to two segments Identification, obtains word content and text point.Finally according to word content and text point, the similar journey of the word of two segments is determined Degree.
Wherein, the picture similarity degree between two segments is determined by template matches and coefficient correlation method, specifically, Template matches refer to just the position that target template is found in a frame image and the place that template is most like is exactly target.As long as All subregions and target template of full figure are compared, the subregion for being most like target template is found, it is exactly the position of target It sets.Again the similarity degree of the two is weighed by calculating the related coefficient between target template and subregion.It is first in the present embodiment First according to the pixel value of each pixel in two segments, the corresponding matrix of each segment is generated respectively.Further according to two segment homographies Between Matrix correlation, determine the picture similarity degree between two segments.
Step 105, according to the similarity degree belonged between the same segment to cluster, it is determined whether there are in duplicate pages Hold.
Specifically, it to belonging to the same segment to cluster, is determined by image similarity comparison and word similarity comparison The similarity degree of segment further determines whether that there are duplicate pages contents.
Wherein, the picture similarity degree between two segments is determined by template matches and coefficient correlation method, specifically, Template matches refer to just the position that target template is found in a frame image and the place that template is most like is exactly target.As long as All subregions and target template of full figure are compared, the subregion for being most like target template is found, it is exactly the position of target It sets.Again the similarity degree of the two is weighed by calculating the related coefficient between target template and subregion.
Further instruction, by optical character identification scanning technique (Optical Character Recognition, OCR) the word content in identification region, Text region module are extracted by the feature to different sample Chinese characters, complete to know Not.
As a kind of situation in the cards, to belonging to same two segments to cluster, if the picture phase between two segments It is more than the first similarity like degree, and the word similarity degree of two segments is more than the second similarity, determines that there are duplicate pages contents.
The case where as alternatively possible realization, to belonging to same two segments to cluster, if the picture between two segments Similarity is not more than the first similarity, alternatively, the word similarity degree of two segments is no more than in the second similarity, determination is not present Duplicate pages content.
The duplicate pages content detection algorithm of the embodiment of the present invention is obtained by carrying out interface sectional drawing to the page to be detected To the image of the page;According to the half-tone information of described image, cut zone is identified from described image;According to described point It cuts region to be split described image, obtains multiple segments;According to the characteristics of image of the multiple segment, clustered, and Similarity degree is determined to belonging to the same segment to cluster;According to the similarity degree belonged between the same segment to cluster, determination is It is no that there are duplicate pages contents.Sectional drawing obtained image in interface is split by this method by image technique, to each Segment is compared, and prepares object of reference without additional, you can completes the detection that content of pages repeats, solves artificial in the prior art Detection needs additionally to prepare a large amount of objects of reference dependent on artificial experience and automation verification, and the higher technology of testing cost is asked Topic, improves the efficiency of duplicate pages content detection.
For an embodiment in clear explanation, more specifically understands after obtaining image to be detected, how to realize image Duplicate contents detect, and the method for present embodiments providing another duplicate pages content detection, Fig. 2 is two institute of the embodiment of the present invention A kind of flow diagram of the duplicate pages content detection algorithm provided.
As shown in Fig. 2, the duplicate pages content detection algorithm may comprise steps of:
Step 201, interface sectional drawing is carried out to the page to be detected, obtains the image of the page.
Specifically, the image of the page to be detected is to carry out what different shot operations obtained to the different pages to be detected.
As a kind of possible situation, sectional drawing, different behaviour are carried out to the page to be detected by mobile device for APP Make system and sectional drawing is carried out using different operations, finally obtains the image of the page to be detected.
As alternatively possible situation, when being detected for the page of webpage, by various in computer equipment Sectional drawing tool, sectional drawing softwares etc. carry out sectional drawing to the page to be detected, obtain the image of the page to be detected.
Step 202, it according to the half-tone information of image, determines the gray scale of each pixel, cut zone is identified from image, And adjacent cut zone is merged.
Specifically, the half-tone information of image refers to the ash of each pixel in obtained image after figure is handled by gray processing Angle value.Wherein, pixel is the minimum unit of image, and an image is made of many pixels, and each pixel array Arrangement.Detection terminal determines the gray value of each pixel in image according to the half-tone information of image, then by identification module from figure Cut zone is identified as in.
Further instruction identifies that the specific method of cut zone is from the image of the page to be detected:Along each picture The row or column of vegetarian refreshments array scans for, and obtains the identical at least one-row pixels point of gray scale or an at least row pixel, and will search At least one-row pixels point or an at least row pixel that rope arrives are as cut zone.And then search adjacent region whether have it is similar Feature then adjacent cut zone is merged if there is similar feature.
Step 203, image is split according to cut zone, obtains multiple segments.
Specifically, image segmentation is that divide the image into several specific, with unique properties and propose mesh interested Target technology and process.Current main image partition method has Threshold segmentation, region segmentation, edge segmentation etc..The present embodiment In using the image partition method based on edge detection, basic ideas are the first edge pixels in determining image, then again These pixels are linked together and just constitute required zone boundary.It solves to divide by detecting the edge comprising different zones Problem, that is, detecting gray level or structure has the place of mutation, shows that the termination in region and another region start Place, this discontinuity is known as edge.Different gradation of images is different, and boundary generally has apparent edge, utilizes this Feature can divide image.
As a kind of mode in the cards, image to be detected can be split along the edge of the cut zone, Obtain multiple segments.Specifically, the gray value of edge pixel is discontinuous in image, and this discontinuity can pass through derivation It counts to detect.
As the mode of alternatively possible realization, cut-off rule can be determined inside cut zone, along cut-off rule, to be checked Altimetric image is split, and obtains multiple segments.
Step 204, it determines the area accounting of each segment in the picture, and deletes the figure that area accounting is less than threshold value accounting Block.
Specifically, it is split to image to be detected, after obtaining multiple segments, it is also necessary to determine each segment in the picture Area accounting and delete area accounting be less than threshold value accounting segment.For example, deleting in the page height accounting less than percent 2 Or area accounting is less than 2/1000ths region.
Step 205, it according to the characteristics of image of multiple segments, is clustered, and phase is determined to belonging to the same segment to cluster Like degree.
It should be noted that being split to image to be detected, after obtaining multiple segments, for each segment, first Color space conversion is carried out, is converted from rgb space to HSV space, it is special then to carry out color to the transformed segment of color space Sign extraction, finally using the color characteristic extracted and segment area accounting in the picture as characteristics of image.Wherein, from RGB Space, which is converted to HSV space, to be realized by conversion formula, and conversion regime is consistent with conversion regime in embodiment one, this implementation Example repeats no more.
Specifically, after carrying out color space conversion to segment, color feature extracted is carried out to transformed segment first. It is directed to the color space of HSV, generally use color histogram carries out feature extraction, and matching process includes:Histogram intersection method, Furthest Neighbor, centre-to-centre spacing method, reference color table method, cumulative color histogram method.
Further instruction carries out Clustering, and to belonging to according to the characteristics of image of multiple segments of extraction to segment The same segment to cluster determines similarity degree.Specifically, it first to belonging to same two segments to cluster, determines between two segments Picture similarity degree.If the picture similarity between two segments is more than the first similarity, then carries out word respectively to two segments Identification, obtains word content and text point.Finally according to word content and text point, the similar journey of the word of two segments is determined Degree.
Wherein, the picture similarity degree between two segments is determined by template matches and coefficient correlation method, specifically, Template matches refer to just the position that target template is found in a frame image and the place that template is most like is exactly target.As long as All subregions and target template of full figure are compared, the subregion for being most like target template is found, it is exactly the position of target It sets.Again the similarity degree of the two is weighed by calculating the related coefficient between target template and subregion.It is first in the present embodiment First according to the pixel value of each pixel in two segments, the corresponding matrix of each segment is generated respectively.Further according to two segment homographies Between Matrix correlation, determine the picture similarity degree between two segments.
Step 206, by judging the picture similarity and word similarity of same two segments to cluster, it is determined whether exist Duplicate pages content.
Specifically, belong to the same segment to cluster to pretreated, pass through image similarity comparison and word similarity It compares to determine the similarity degree of segment, further determines whether that there are duplicate pages contents.
Wherein, the picture similarity degree between two segments is determined by template matches and coefficient correlation method, specifically, Template matches refer to just the position that target template is found in a frame image and the place that template is most like is exactly target.As long as All subregions and target template of full figure are compared, the subregion for being most like target template is found, it is exactly the position of target It sets.Again the similarity degree of the two is weighed by calculating the related coefficient between target template and subregion.
Further instruction passes through the word content in optical character identification scanning technique identification region, Text region mould Block is extracted by the feature to different sample Chinese characters, completes identification.
As a kind of situation in the cards, to belonging to same two segments to cluster, if the picture phase between two segments It is more than the first similarity like degree, and the word similarity degree of two segments is more than the second similarity, determines that there are duplicate pages contents.
The case where as alternatively possible realization, to belonging to same two segments to cluster, if the picture between two segments Similarity is not more than the first similarity, alternatively, the word similarity degree of two segments is no more than in the second similarity, determination is not present Duplicate pages content.
As a kind of possible application scenarios, when executing method provided in this embodiment, duplicate pages content detection side Method is identical, but detecting step, there may be difference, each step may merge execution, it is also possible to be split as an aforementioned step More multi-step executes, and the present embodiment additionally provides method flow as shown in Figure 3, and Fig. 3 is provided to execute the embodiment of the present invention Another duplicate pages content detection algorithm flow diagram.
As shown in figure 3, step 301 is first carried out, picture to be detected is inputted to detection terminal, wherein picture to be detected is By carrying out what interface sectional drawing obtained to the page to be detected.
Secondly, step 302 is executed, page segmentation is carried out to picture to be detected.The step includes label cut zone, merges Cut zone and determining cut-off rule.
Specifically, after picture to be detected is handled by gray processing, according to the half-tone information of image, the ash of each pixel is determined Degree, then cut zone is identified from image by identification module, to search whether adjacent region has similar feature, such as Fruit has similar feature, then is merged to adjacent cut zone.It finally can be along the edge of the cut zone to be checked Altimetric image is split, and multiple segments is obtained, or cut-off rule is determined inside cut zone, along cut-off rule, to figure to be detected As being split, multiple segments are obtained.
Further, step 303 is executed, the image after segmentation is pre-processed.The step includes to the figure after segmentation As being filtered, extracting provincial characteristics and carrying out Clustering according to the feature of extraction.
Specifically, it is split to image to be detected, after obtaining multiple segments, it is also necessary to determine each segment in the picture Area accounting and delete area accounting be less than threshold value accounting segment.Then it is directed to each segment, color space is carried out and turns It changes, is converted from rgb space to HSV space, color feature extracted, the face that will be extracted are carried out to the transformed segment of color space The area accounting of color characteristic and segment in the picture is as characteristics of image.Finally according to the characteristics of image of multiple segments of extraction, Clustering is carried out to segment,
Then, step 304 is executed, comparison processing is carried out to pretreated picture.Finally execute step 305, output weight Multiple region.
Specifically, comparison processing is carried out to pretreated picture, by belonging to same to pretreated and clustering Segment carries out picture similarity comparison and word similarity comparison to determine the similarity degree of segment, further determines whether exist Duplicate pages content.The repeat region of content of pages is exported eventually by detection terminal.
It is only briefly described above to executing detecting step shown in Fig. 3, specific detection method and this implementation of execution Detection method is identical shown in example, no longer repeats herein.
The duplicate pages content detection algorithm of the embodiment of the present invention is obtained by carrying out interface sectional drawing to the page to be detected To the image of the page;According to the half-tone information of image, the gray scale of each pixel is determined, cut zone is identified from image, and Adjacent cut zone is merged;Image is split according to cut zone, obtains multiple segments;Determine that each segment exists Area accounting in image, and delete the segment that area accounting is less than threshold value accounting;According to the characteristics of image of multiple segments, carry out Cluster, and determine similarity degree to belonging to the same segment to cluster;By the picture similarity for judging same two segments to cluster With word similarity, it is determined whether there are duplicate pages contents.This method is by image technique by the obtained figure of interface sectional drawing Shape is split, and to which each segment be compared, prepares object of reference without additional, you can complete the inspection that content of pages repeats It surveys, solves artificial detection in the prior art and need additionally to prepare a large amount of objects of reference dependent on artificial experience and automation verification, The high technical problem of testing cost, improves the efficiency of duplicate pages content detection.
As shown in Figure 4 answer can be obtained by test in the duplicate contents detection method illustrated using above-described embodiment With achievement, Fig. 4 is the exemplary plot of the duplicate pages content found using the duplicate contents detection method of the present invention.Using above-mentioned Detection method, it can be found that three pictures are respectively present the content of repetition in Fig. 4, in two, the upper left corner rectangle frame of left side picture Word content repeat, the phenomenon that there are image content repetitions in the picture of two, the right, there are exceptions.
In order to realize that above-described embodiment, the present invention also propose a kind of duplicate pages content detection device.
Fig. 5 is a kind of structural schematic diagram of duplicate pages content detection device provided in an embodiment of the present invention.
As shown in figure 5, the duplicate pages content detection device includes:Acquisition module 110, identification module 120, segmentation module 130, cluster module 140, detection module 150.
Acquisition module 110 obtains the image of the page for carrying out interface sectional drawing to the page to be detected.
Specifically, acquisition module 110 is by carrying out interface sectional drawing to the page to be detected, obtaining the image of the page, no Mobile device with brand carries out sectional drawing using different operations, obtains picture to be detected, then picture to be detected is inputted and is detected Terminal.
Identification module 120 identifies cut zone for the half-tone information according to described image from described image.
Specifically, after carrying out gray processing processing to picture to be detected, the gray value of each pixel in image is obtained.Along quilt The row or column of each pixel array after gray processing scans for, and obtains the identical at least one-row pixels point of gray scale or at least one row Pixel, and at least one-row pixels point searched or an at least row pixel are labeled as cut zone.Identification module 120 According to the gray value of each pixel, cut zone is identified from image.
Segmentation module 130 obtains multiple segments for being split to described image according to the cut zone.
Specifically, for identification module 120 after identifying cut zone in image, segmentation module 130 is split image Processing, image segmentation be divide the image into several it is specific, with unique properties and propose the technology of interesting target and Process.
Using the image partition method based on edge detection in the present embodiment, basic ideas are first in determining image Then edge pixel again links together these pixels and just constitutes required zone boundary.Include different zones by detection Edge solve segmentation problem, that is, detecting gray level or structure has the place of mutation, show the termination in a region, It is the place that another region starts, this discontinuity is known as edge.Different gradation of images is different, and boundary generally has bright Aobvious edge can divide image using this feature, obtain multiple segments.
Cluster module 140 is clustered, and cluster to belonging to same for the characteristics of image according to the multiple segment Segment determine similarity degree.
Specifically, the characteristics of image of segment refers to being split to image to be detected, after obtaining multiple segments, for every One segment carries out color space conversion, is converted from rgb space to HSV space, then to the transformed figure of color space first Block carries out color feature extracted, finally that the area accounting of the color characteristic extracted and segment in the picture is special as image Sign.Cluster module 140 carries out Clustering, same group of segment is with similar according to the characteristics of image of multiple segments to segment Characteristics of image.
Further instruction, to belonging to same two segments to cluster, cluster module 140, it is first determined between two segments Picture similarity degree.If the picture similarity between two segments is more than the first similarity, then carries out word respectively to two segments Identification, obtains word content and text point.Finally according to word content and text point, the similar journey of the word of two segments is determined Degree.
Detection module 150, for according to the similarity degree belonged between the same segment to cluster, it is determined whether there are repetitions Content of pages.
Specifically, to belonging to the same segment to cluster, detection module 150 passes through image similarity comparison and word similarity It compares to determine the similarity degree of segment, further determines whether that there are duplicate pages contents.
As a kind of situation in the cards, to belonging to same two segments to cluster, if the picture phase between two segments It is more than the first similarity like degree, and the word similarity degree of two segments is more than the second similarity, determines that there are duplicate pages contents.
The case where as alternatively possible realization, to belonging to same two segments to cluster, if the picture between two segments Similarity is not more than the first similarity, alternatively, the word similarity degree of two segments is no more than in the second similarity, determination is not present Duplicate pages content.
The duplicate pages content detection device of the embodiment of the present invention is obtained by carrying out interface sectional drawing to the page to be detected To the image of the page;According to the half-tone information of described image, cut zone is identified from described image;According to described point It cuts region to be split described image, obtains multiple segments;According to the characteristics of image of the multiple segment, clustered, and Similarity degree is determined to belonging to the same segment to cluster;According to the similarity degree belonged between the same segment to cluster, determination is It is no that there are duplicate pages contents.Sectional drawing obtained image in interface is split by this method by image technique, to each Segment is compared, and prepares object of reference without additional, you can completes the detection that content of pages repeats, solves artificial in the prior art Detection needs additionally to prepare a large amount of objects of reference dependent on artificial experience and automation verification, and the higher technology of testing cost is asked Topic, improves the efficiency of duplicate pages content detection.
It should be noted that aforementioned be also applied for the implementation to the explanation for repeating content of pages detection method embodiment The duplicate pages content detection device of example, details are not described herein again.
In order to realize above-described embodiment, the present invention also proposes another computer equipment, including:Processor and for storing The memory of the processor-executable instruction.
Wherein, the processor by read the executable program code stored in the memory run with it is described can The corresponding program of program code is executed, for realizing the duplicate pages content detection side proposed such as present invention Method.
In order to realize that above-described embodiment, the present invention also propose a kind of computer readable storage medium, it is stored thereon with calculating Machine program, which is characterized in that the program is realized when being executed by processor in the duplicate pages that above-mentioned first aspect embodiment proposes Hold detection method.
Fig. 6 shows the block diagram of the exemplary computer device suitable for being used for realizing the application embodiment.What Fig. 6 was shown Computer equipment 12 is only an example, should not bring any restrictions to the function and use scope of the embodiment of the present application.
As shown in fig. 6, computer equipment 12 is showed in the form of universal computing device.The component of computer equipment 12 can be with Including but not limited to:One or more processor or processing unit 16, system storage 28 connect different system component The bus 18 of (including system storage 28 and processing unit 16).
Bus 18 indicates one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures.It lifts For example, these architectures include but not limited to industry standard architecture (Industry Standard Architecture;Hereinafter referred to as:ISA) bus, microchannel architecture (Micro Channel Architecture;Below Referred to as:MAC) bus, enhanced isa bus, Video Electronics Standards Association (Video Electronics Standards Association;Hereinafter referred to as:VESA) local bus and peripheral component interconnection (Peripheral Component Interconnection;Hereinafter referred to as:PCI) bus.
Computer equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by The usable medium that computer equipment 12 accesses, including volatile and non-volatile media, moveable and immovable medium.
Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory Device (Random Access Memory;Hereinafter referred to as:RAM) 30 and/or cache memory 32.Computer equipment 12 can be with Further comprise other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, Storage system 34 can be used for reading and writing immovable, non-volatile magnetic media, and (Fig. 6 do not show, commonly referred to as " hard drive Device ").Although being not shown in Fig. 6, can provide for being driven to the disk for moving non-volatile magnetic disk (such as " floppy disk ") read-write Dynamic device, and to removable anonvolatile optical disk (such as:Compact disc read-only memory (Compact Disc Read Only Memory;Hereinafter referred to as:CD-ROM), digital multi CD-ROM (Digital Video Disc Read Only Memory;Hereinafter referred to as:DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving Device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program production Product, the program product have one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application The function of embodiment.
Program/utility 40 with one group of (at least one) program module 42 can be stored in such as memory 28 In, such program module 42 include but not limited to operating system, one or more application program, other program modules and Program data may include the realization of network environment in each or certain combination in these examples.Program module 42 is usual Execute the function and/or method in embodiments described herein.
Computer equipment 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 Deng) communication, the equipment interacted with the computer system/server 12 can be also enabled a user to one or more to be communicated, and/ Or with any equipment (example that the computer system/server 12 is communicated with one or more of the other computing device Such as network interface card, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 22.Also, it calculates Machine equipment 12 can also pass through network adapter 20 and one or more network (such as LAN (Local Area Network;Hereinafter referred to as:LAN), wide area network (Wide Area Network;Hereinafter referred to as:WAN) and/or public network, example Such as internet) communication.As shown, network adapter 20 is communicated by bus 18 with other modules of computer equipment 12.It answers When understanding, although not shown in the drawings, other hardware and/or software module can not used in conjunction with computer equipment 12, including but not It is limited to:Microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and Data backup storage system etc..
Processing unit 16 is stored in program in system storage 28 by operation, to perform various functions application and Data processing, such as realize the method referred in previous embodiment.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or when necessary with it His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage Or firmware is realized.Such as, if realized in another embodiment with hardware, following skill well known in the art can be used Any one of art or their combination are realized:With for data-signal realize logic function logic gates from Logic circuit is dissipated, the application-specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile Journey gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the present invention System, those skilled in the art can be changed above-described embodiment, change, replace and become within the scope of the invention Type.

Claims (12)

1. a kind of duplicate pages content detection algorithm, which is characterized in that the described method comprises the following steps:
Interface sectional drawing is carried out to the page to be detected, obtains the image of the page;
According to the half-tone information of described image, cut zone is identified from described image;
Described image is split according to the cut zone, obtains multiple segments;
It according to the characteristics of image of the multiple segment, is clustered, and similarity degree is determined to belonging to the same segment to cluster;
According to the similarity degree belonged between the same segment to cluster, it is determined whether there are duplicate pages contents.
2. duplicate pages content detection algorithm according to claim 1, which is characterized in that the ash according to described image Information is spent, cut zone is identified from described image, including:
According to the half-tone information, the gray scale of each pixel in described image is determined;Wherein, in described image, each pixel-matrix Row arrangement;
In described image, the row or column along array scans for, and obtains the identical at least one-row pixels point of gray scale or at least one Row pixel, and using at least one-row pixels point searched or an at least row pixel as cut zone.
3. duplicate pages content detection algorithm according to claim 2, which is characterized in that at least one will searched After row pixel or at least a row pixel are as cut zone, further include:
Adjacent cut zone is merged.
4. duplicate pages content detection algorithm according to claim 2, which is characterized in that described according to the cut zone Described image is split, multiple segments are obtained, including:
Described image is split along the edge of the cut zone, obtains multiple segments;
Alternatively, determining that cut-off rule is split described image along the cut-off rule, obtains more inside the cut zone A segment.
5. according to claim 1-4 any one of them duplicate pages content detection algorithms, which is characterized in that described in the basis Cut zone is split described image, after obtaining multiple segments, further includes:
Determine area accounting of each segment in described image;
Delete the segment that area accounting is less than threshold value accounting.
6. according to claim 1-4 any one of them duplicate pages content detection algorithms, which is characterized in that described in the basis Cut zone is split described image, after obtaining multiple segments, further includes:
For each segment, color space conversion is carried out, is converted from rgb space to HSV space;
Color feature extracted is carried out to the transformed segment of color space;
Using the area accounting of the color characteristic extracted and the segment in the picture as characteristics of image.
7. according to claim 1-4 any one of them duplicate pages content detection algorithms, which is characterized in that described pair belongs to same One segment to cluster determines similarity degree, including:
To belonging to same two segments to cluster, the picture similarity degree between two segment is determined;
If the picture similarity between two segment is more than the first similarity, Text region is carried out respectively to two segment, Obtain word content and text point;
According to the word content and text point, the word similarity degree of two segment is determined.
8. duplicate pages content detection algorithm according to claim 7, which is characterized in that the basis belongs to same and clusters Segment between similarity degree, it is determined whether there are duplicate pages contents, including:
To belonging to same two segments to cluster, if the picture similarity between two segment is more than the first similarity, and it is described The word similarity degree of two segments is more than the second similarity, determines that there are duplicate pages contents;
If the picture similarity between two segment is not more than the first similarity, alternatively, the similar journey of word of two segment Degree determines and duplicate pages content is not present no more than in the second similarity.
9. duplicate pages content detection algorithm according to claim 7, which is characterized in that determination two segment it Between picture similarity degree, including:
According to the pixel value of each pixel in two segment, the corresponding matrix of each segment is generated respectively;
According to the Matrix correlation between the two segments homography, the similar journey of picture between two segment is determined Degree.
10. a kind of duplicate pages content detection device, which is characterized in that described device includes:
Acquisition module obtains the image of the page for carrying out interface sectional drawing to the page to be detected;
Identification module identifies cut zone for the half-tone information according to described image from described image;
Segmentation module obtains multiple segments for being split to described image according to the cut zone;
Cluster module is clustered, and true to belonging to the same segment to cluster for the characteristics of image according to the multiple segment Determine similarity degree;
Detection module, for according to the similarity degree belonged between the same segment to cluster, it is determined whether there are in duplicate pages Hold.
11. a kind of computer equipment, which is characterized in that including:Memory, processor and storage on a memory and can handled The computer program run on device when the processor executes described program, realizes the weight as described in any in claim 1-9 Multiple content of pages detection method.
12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The duplicate pages content detection algorithm as described in any in claim 1-9 is realized when execution.
CN201810545595.7A 2018-05-25 2018-05-25 Method and device for detecting repeated page content Active CN108764352B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810545595.7A CN108764352B (en) 2018-05-25 2018-05-25 Method and device for detecting repeated page content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810545595.7A CN108764352B (en) 2018-05-25 2018-05-25 Method and device for detecting repeated page content

Publications (2)

Publication Number Publication Date
CN108764352A true CN108764352A (en) 2018-11-06
CN108764352B CN108764352B (en) 2022-09-27

Family

ID=64000956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810545595.7A Active CN108764352B (en) 2018-05-25 2018-05-25 Method and device for detecting repeated page content

Country Status (1)

Country Link
CN (1) CN108764352B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615017A (en) * 2018-12-21 2019-04-12 大连海事大学 Consider the Stack Overflow replication problem detection method of more reference factors
CN109670507A (en) * 2018-11-27 2019-04-23 维沃移动通信有限公司 Image processing method, device and mobile terminal
CN109739752A (en) * 2018-12-21 2019-05-10 北京城市网邻信息技术有限公司 Built-in resource testing method, apparatus, electronic equipment and readable storage medium storing program for executing
CN110147516A (en) * 2019-04-15 2019-08-20 深圳壹账通智能科技有限公司 The intelligent identification Method and relevant device of front-end code in Pages Design
CN110532188A (en) * 2019-08-30 2019-12-03 北京三快在线科技有限公司 The method and apparatus of page presentation test
CN110716778A (en) * 2019-09-10 2020-01-21 阿里巴巴集团控股有限公司 Application compatibility testing method, device and system
WO2020177584A1 (en) * 2019-03-01 2020-09-10 华为技术有限公司 Graphic typesetting method and related device
CN112527282A (en) * 2020-12-18 2021-03-19 平安银行股份有限公司 Front-end page checking method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070025617A1 (en) * 2005-06-09 2007-02-01 Canon Kabushiki Kaisha Image processing method and apparatus
CN101145902A (en) * 2007-08-17 2008-03-19 东南大学 Fishing webpage detection method based on image processing
US20080137954A1 (en) * 2006-12-12 2008-06-12 Yichuan Tang Method And Apparatus For Identifying Regions Of Different Content In An Image
CN101859309A (en) * 2009-04-07 2010-10-13 慧科讯业有限公司 System and method for identifying repeated text
CN105022752A (en) * 2014-04-29 2015-11-04 中国电信股份有限公司 Image retrieval method and apparatus
CN105404683A (en) * 2015-11-30 2016-03-16 北大方正集团有限公司 Format file processing method and apparatus
CN105678814A (en) * 2016-01-05 2016-06-15 武汉大学 Method for detecting repetitive texture of building facade image in combination with phase correlation analysis
CN106156749A (en) * 2016-07-25 2016-11-23 福建星网锐捷安防科技有限公司 Method for detecting human face based on selective search and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070025617A1 (en) * 2005-06-09 2007-02-01 Canon Kabushiki Kaisha Image processing method and apparatus
US20080137954A1 (en) * 2006-12-12 2008-06-12 Yichuan Tang Method And Apparatus For Identifying Regions Of Different Content In An Image
CN101145902A (en) * 2007-08-17 2008-03-19 东南大学 Fishing webpage detection method based on image processing
CN101859309A (en) * 2009-04-07 2010-10-13 慧科讯业有限公司 System and method for identifying repeated text
CN105022752A (en) * 2014-04-29 2015-11-04 中国电信股份有限公司 Image retrieval method and apparatus
CN105404683A (en) * 2015-11-30 2016-03-16 北大方正集团有限公司 Format file processing method and apparatus
CN105678814A (en) * 2016-01-05 2016-06-15 武汉大学 Method for detecting repetitive texture of building facade image in combination with phase correlation analysis
CN106156749A (en) * 2016-07-25 2016-11-23 福建星网锐捷安防科技有限公司 Method for detecting human face based on selective search and device

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670507A (en) * 2018-11-27 2019-04-23 维沃移动通信有限公司 Image processing method, device and mobile terminal
CN109615017A (en) * 2018-12-21 2019-04-12 大连海事大学 Consider the Stack Overflow replication problem detection method of more reference factors
CN109739752A (en) * 2018-12-21 2019-05-10 北京城市网邻信息技术有限公司 Built-in resource testing method, apparatus, electronic equipment and readable storage medium storing program for executing
CN109739752B (en) * 2018-12-21 2022-10-25 北京城市网邻信息技术有限公司 Built-in resource testing method and device, electronic equipment and readable storage medium
CN109615017B (en) * 2018-12-21 2021-06-29 大连海事大学 Stack Overflow repeated problem detection method considering multiple reference factors
WO2020177584A1 (en) * 2019-03-01 2020-09-10 华为技术有限公司 Graphic typesetting method and related device
US11790584B2 (en) 2019-03-01 2023-10-17 Huawei Technologies Co., Ltd. Image and text typesetting method and related apparatus thereof
CN110147516A (en) * 2019-04-15 2019-08-20 深圳壹账通智能科技有限公司 The intelligent identification Method and relevant device of front-end code in Pages Design
CN110532188B (en) * 2019-08-30 2021-06-29 北京三快在线科技有限公司 Page display test method and device
CN110532188A (en) * 2019-08-30 2019-12-03 北京三快在线科技有限公司 The method and apparatus of page presentation test
CN110716778A (en) * 2019-09-10 2020-01-21 阿里巴巴集团控股有限公司 Application compatibility testing method, device and system
CN110716778B (en) * 2019-09-10 2023-09-26 创新先进技术有限公司 Application compatibility testing method, device and system
CN112527282A (en) * 2020-12-18 2021-03-19 平安银行股份有限公司 Front-end page checking method and device, electronic equipment and storage medium
CN112527282B (en) * 2020-12-18 2023-11-07 平安银行股份有限公司 Front-end page verification method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN108764352B (en) 2022-09-27

Similar Documents

Publication Publication Date Title
CN108764352A (en) Duplicate pages content detection algorithm and device
US10817741B2 (en) Word segmentation system, method and device
CN109657673B (en) Image recognition method and terminal
CN107093172A (en) character detecting method and system
EP2323069A2 (en) Method, device and system for content based image categorization field
CN107292307B (en) Automatic identification method and system for inverted Chinese character verification code
CN104899586A (en) Method for recognizing character contents included in image and device thereof
CN108960382A (en) A kind of colour barcode and its color calibration method
CN111738252B (en) Text line detection method, device and computer system in image
CN111259891B (en) Method, device, equipment and medium for identifying identity card in natural scene
Shafait et al. Pixel-accurate representation and evaluation of page segmentation in document images
CN113569863B (en) Document checking method, system, electronic equipment and storage medium
KR20200020305A (en) Method and Apparatus for character recognition
CN109858570A (en) Image classification method and system, computer equipment and medium
Dutta et al. Multi-lingual text localization from camera captured images based on foreground homogenity analysis
CN112507923A (en) Certificate copying detection method and device, electronic equipment and medium
CN103136536A (en) System and method for detecting target and method for exacting image features
CN116012860B (en) Teacher blackboard writing design level diagnosis method and device based on image recognition
Zhang et al. Computational method for calligraphic style representation and classification
WO2023159771A1 (en) Rpa and ai-based invoice processing method and apparatus, device, and medium
Lin et al. Multilingual corpus construction based on printed and handwritten character separation
CN112861861B (en) Method and device for recognizing nixie tube text and electronic equipment
CN103136524A (en) Object detecting system and method capable of restraining detection result redundancy
JP2003087562A (en) Image processor and image processing method
CN113807315A (en) Method, device, equipment and medium for constructing recognition model of object to be recognized

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant