CN108764352A - Duplicate pages content detection algorithm and device - Google Patents
Duplicate pages content detection algorithm and device Download PDFInfo
- Publication number
- CN108764352A CN108764352A CN201810545595.7A CN201810545595A CN108764352A CN 108764352 A CN108764352 A CN 108764352A CN 201810545595 A CN201810545595 A CN 201810545595A CN 108764352 A CN108764352 A CN 108764352A
- Authority
- CN
- China
- Prior art keywords
- segment
- image
- duplicate pages
- similarity
- described image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The present invention proposes a kind of duplicate pages content detection algorithm and device, wherein method includes:By carrying out interface sectional drawing to the page to be detected, the image of the page is obtained;According to the half-tone information of described image, cut zone is identified from described image;Described image is split according to the cut zone, obtains multiple segments;It according to the characteristics of image of the multiple segment, is clustered, and similarity degree is determined to belonging to the same segment to cluster;According to the similarity degree belonged between the same segment to cluster, it is determined whether there are duplicate pages contents.Sectional drawing obtained image in interface is split by this method by image technique, to which each segment be compared, the detection that content of pages repeats can be completed, automation verification in the prior art is solved to need additionally to prepare a large amount of objects of reference, the higher technical problem of testing cost, improves the efficiency of duplicate pages content detection.
Description
Technical field
The present invention relates to technical field of mobile terminals more particularly to a kind of duplicate pages content detection algorithms and device.
Background technology
The front end pages such as application program (Application, APP), webpage will appear the problem of content of pages repeats so that
Duplicate pages content test has become important test session.In actual use, the repetition of content of pages, not only seriously
The experience for affecting user will also result in the waste of Internet resources.
In the prior art, the problem of being repeated for content of pages, the method for still using artificial detection examine content of pages
It surveys, detection accuracy depends on artificial experience.Although in the prior art, having also appeared automation calibration technology, this technology
It needs additionally to prepare object of reference, content of pages and reference substance is compared, this mode is vulnerable to object of reference limitation, results in a finding that weight
Multiple content of pages ability is weak, needs additionally to prepare a large amount of objects of reference if duplicate pages detection accuracy need to be improved, testing cost compared with
It is high.In short, duplicate pages content detection in the prior art is less efficient.
Invention content
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, the present invention proposes a kind of duplicate pages content detection algorithm, duplicate pages content is efficiently completed to realize
Detection solves artificial detection in the prior art and artificial experience, and automation verification is depended on to need additionally to prepare a large amount of references
Object, the higher technical problem of testing cost, improves the efficiency of duplicate pages content detection.
The present invention proposes a kind of duplicate pages content detection device.
The present invention proposes a kind of computer equipment.
The present invention proposes a kind of computer readable storage medium.
First aspect present invention embodiment proposes a kind of duplicate pages content detection algorithm, including:
Interface sectional drawing is carried out to the page to be detected, obtains the image of the page;
According to the half-tone information of described image, cut zone is identified from described image;
Described image is split according to the cut zone, obtains multiple segments;
It according to the characteristics of image of the multiple segment, is clustered, and similar journey is determined to belonging to the same segment to cluster
Degree;
According to the similarity degree belonged between the same segment to cluster, it is determined whether there are duplicate pages contents.
The duplicate pages content detection algorithm of the embodiment of the present invention is obtained by carrying out interface sectional drawing to the page to be detected
To the image of the page;According to the half-tone information of described image, cut zone is identified from described image;According to described point
It cuts region to be split described image, obtains multiple segments;According to the characteristics of image of the multiple segment, clustered, and
Similarity degree is determined to belonging to the same segment to cluster;According to the similarity degree belonged between the same segment to cluster, determination is
It is no that there are duplicate pages contents.Image obtained by the sectional drawing of interface is split by this method by image technique, to each figure
Block is compared, and prepares object of reference without additional, you can the detection for completing duplicate pages content solves artificial in the prior art
Detection depends on artificial experience, and automation verification to need additionally to prepare a large amount of objects of reference, and the higher technology of testing cost is asked
Topic, improves the efficiency of duplicate pages content detection.
Second aspect of the present invention embodiment proposes a kind of duplicate pages content detection device, including:
Acquisition module obtains the image of the page for carrying out interface sectional drawing to the page to be detected;
Identification module identifies cut zone for the half-tone information according to described image from described image;
Segmentation module obtains multiple segments for being split to described image according to the cut zone;
Cluster module is clustered, and for the characteristics of image according to the multiple segment to belonging to the same figure to cluster
Block determines similarity degree;
Detection module, for according to the similarity degree belonged between the same segment to cluster, it is determined whether there are duplicate pages
Face content.
The duplicate pages content detection device of the embodiment of the present invention is obtained by carrying out interface sectional drawing to the page to be detected
To the image of the page;According to the half-tone information of described image, cut zone is identified from described image;According to described point
It cuts region to be split described image, obtains multiple segments;According to the characteristics of image of the multiple segment, clustered, and
Similarity degree is determined to belonging to the same segment to cluster;According to the similarity degree belonged between the same segment to cluster, determination is
It is no that there are duplicate pages contents.Image obtained by the sectional drawing of interface is split by this method by image technique, to each figure
Block is compared, and prepares object of reference without additional, you can completes the detection of duplicate pages content, solves artificial inspection in the prior art
It surveys and needs additionally to prepare a large amount of objects of reference dependent on artificial experience and automation verification, the higher technical problem of testing cost,
Improve the efficiency of duplicate pages content detection.
Third aspect present invention embodiment proposes a kind of computer equipment, including:Processor;For storing the processing
The memory of device executable instruction;Wherein, the processor is transported by reading the executable program code stored in memory
Row program corresponding with executable program code, for executing the duplicate pages content detection side described in first aspect embodiment
Method.
Fourth aspect present invention embodiment proposes a kind of computer readable storage medium, the finger in the storage medium
When order is executed by processor, for executing the duplicate pages content detection algorithm described in first aspect embodiment.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description
Obviously, or practice through the invention is recognized.
Description of the drawings
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments
Obviously and it is readily appreciated that, wherein:
A kind of flow diagram for duplicate pages content detection algorithm that Fig. 1 is provided by the embodiment of the present invention;
A kind of flow diagram for duplicate pages content detection algorithm that Fig. 2 is provided by the embodiment of the present invention two;
The flow diagram for another duplicate pages content detection algorithm that Fig. 3 is provided by the embodiment of the present invention;
Fig. 4 is the significant problem exemplary plot found in mobile product using duplicate pages content detection algorithm of the present invention;
A kind of structural schematic diagram for duplicate pages content detection device that Fig. 5 is provided by the embodiment of the present invention;And
Fig. 6 shows the block diagram of the exemplary computer device suitable for being used for realizing the application embodiment.
Specific implementation mode
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Currently, application of the mobile terminal device in daily life is increasingly extensive, user during use,
The phenomenon that finding the front end pages such as APP, webpage there are content of pages repetitions, lead to poor user experience so that in duplicate pages
Hold test and has become important test session.In the prior art, for duplicate pages content detection, using the figure based on object of reference
As comparison discovery method, by manually carrying out accurate sectional drawing according to the pre-set automation track of each step, it is desirable that each
Step is required for and step strict conformance in automated execution with reference to sectional drawing.But actually detected discovery in the process, traditional is artificial
There are apparent limitations for detection technique, and cannot effectively find duplicate pages content.
There are limitation and testing result are undesirable for middle duplicate pages content detection technique for the above-mentioned prior art
Problem.In the embodiment of the present invention, the detection that content of pages repeats is completed by image technique and does not need volume in detection process
External reference object is split processing to the page first and obtains multiple segments, further according to multiple segments after obtaining mobile phone sectional drawing
Characteristics of image is clustered, and determines similarity degree to belonging to the same segment to cluster, finally determines whether there is duplicate pages
Content.For the ease of being better understood from, the several essential terms occurred in the present invention are introduced first:
First similarity:There are when duplicate pages content between determining two segments, the threshold value of picture similarity degree.
Second similarity:There are when duplicate pages content between determining two segments, the threshold value of word similarity degree.
Below with reference to the accompanying drawings the method and apparatus for describing the embodiment of the present invention.
A kind of flow diagram for duplicate pages content detection algorithm that Fig. 1 is provided by the embodiment of the present invention.
As shown in Figure 1, the duplicate pages content detection algorithm includes the following steps:
Step 101, interface sectional drawing is carried out to the page to be detected, obtains the image of the page.
Specifically, it before being detected content of pages, first has to obtain the image of the page to be detected, the present embodiment
In interface sectional drawing carried out to the page to be detected by mobile device, the mobile device of different operating system uses different operations
Sectional drawing is carried out, the image of the page to be detected is finally obtained.
Step 102, according to the half-tone information of described image, cut zone is identified from described image.
It should be noted that gray scale refers to indicating object, i.e., the color on the basis of black, different saturation degrees using black tone
Black show image.Each gray scale object has the brightness value from 0% (white) to grayscale bar 100% (black).Figure
The half-tone information of picture refers to the gray value of each pixel in obtained image after figure is handled by gray processing.Wherein, pixel is figure
The minimum unit of picture, an image are made of many pixels, and each pixel array arrangement.
Further instruction, since the color of each pixel in coloured image has tri- components of R, G, B to determine, and it is each
Component has 255 values desirable, and such a pixel can have the variation range of the color of more than 1,600 ten thousand (255*255*255).
And gray level image is a kind of identical special coloured image of tri- components of R, G, B, the variation range of one pixel is 255
Kind, so the image of various formats is first generally transformed into gray level image in Digital Image Processing, to reduce subsequent image
Calculation amount.The description of gray level image still reflects the coloration and bright of the entirety and part of entire image as coloured image
Spend distribution and the feature of grade.
Image by the page to be detected in this present embodiment is colored, therefore after the image for obtaining the page to be detected,
Gray processing processing first is carried out to image, obtains the half-tone information of image, then identify cut zone from image.Wherein, figure
Gray processing processing be the process that coloured image is transformed into gray level image.
Optionally, identify that the specific method of cut zone is from the image of the page to be detected:Along each pixel-matrix
The row or column of row scans for, obtain the identical at least one-row pixels point of gray scale or an at least row pixel, and will search
At least one-row pixels point or at least a row pixel is merged as cut zone, then to adjacent cut zone.
Step 103, described image is split according to the cut zone, obtains multiple segments.
Specifically, image segmentation is that divide the image into several specific, with unique properties and propose mesh interested
Target technology and process.Current main image partition method has Threshold segmentation, region segmentation, edge segmentation etc..The present embodiment
In using the image partition method based on edge detection, basic ideas are the first edge pixels in determining image, then again
These pixels are linked together and constitute required zone boundary.It is asked by detecting the edge comprising different zones to solve segmentation
Topic, that is, detecting gray level or structure has the place of mutation, shows what the termination in region and another region started
Place, this discontinuity are known as edge.Different gradation of images is different, and boundary generally has apparent edge, utilizes this spy
Sign can divide image.
As a kind of mode in the cards, image to be detected can be split along the edge of the cut zone,
Obtain multiple segments.Specifically, the gray value of edge pixel is discontinuous in image, and this discontinuity can pass through derivation
It counts to detect.
As the mode of alternatively possible realization, it can also determine that cut-off rule is treated along cut-off rule inside cut zone
Detection image is split, and obtains multiple segments.
It should be noted that being split to image to be detected, after obtaining multiple segments, it is also necessary to determine that each segment exists
Area accounting in described image simultaneously deletes the segment that area accounting is less than threshold value accounting.Because of the too low segment pair of area accounting
It is influenced less when segment is clustered, therefore to delete the segment that area accounting is less than threshold value accounting.
Step 104, it according to the characteristics of image of the multiple segment, is clustered, and true to belonging to the same segment to cluster
Determine similarity degree.
It should be noted that be split to image to be detected, it is first for each segment after obtaining multiple segments
Color space conversion is first carried out, is converted from rgb space to HSV space, color then is carried out to the transformed segment of color space
Feature extraction, finally using the color characteristic extracted and segment area accounting in the picture as characteristics of image.According to multiple
The characteristics of image of segment carries out Clustering to segment, and same group of segment has similar characteristics of image.
Specifically, it can be expressed there are many color space model for coloured image, in image procossing, frequently with RGB
Model and HSV models.RGB models are the three primary colors based on human vision --- red (R), green (G), blue (B) theoretical color established
Color space thinks to be properly mixed with red (R), green (G), blue (B) 3 kinds of coloured light, can cause all any on electromagnetic spectrum
The perception of color.HSV models are the color spaces that the vision perception characteristic based on people is established, and wherein coloration (H) indicates different
Color, such as red, green, blue;Saturation degree (S) indicates the depth of color, such as dark blue, light blue;Brightness (V) indicates the light and shade journey of color
Degree, such as very bright (brilliant white), very dark (dark).Being converted from rgb space to HSV space is realized by conversion formula, for turning
Changing formula, there are many expression-forms, and principle is the same, and the present embodiment is described below conversion formula:
The value range of the R in formula above, G, B are [0,255];The value range of H is [0,360];The value range of S
It is [0,1];The value range of V is [0,255].In actual image procossing, usually H, S, V value ranges are normalized to
[0,1]。
It should be noted that after carrying out color space conversion to segment, it is special that color is carried out to transformed segment first
Sign extraction.It is directed to the color space of HSV, generally use color histogram carries out feature extraction, and matching process includes:Histogram
Intersection method, Furthest Neighbor, centre-to-centre spacing method, reference color table method, cumulative color histogram method.By by the color space conversion of RGB
To the color space of HSV, coloration, saturation degree, brightness are separated, convenient for more accurately identifying picture phase according to color characteristic
Like degree.
Further instruction carries out Clustering, and to belonging to according to the characteristics of image of multiple segments of extraction to segment
The same segment to cluster determines similarity degree.Specifically, to belonging to same two segments to cluster, it is first determined between two segments
Picture similarity degree.If the picture similarity between two segments is more than the first similarity, then carries out word respectively to two segments
Identification, obtains word content and text point.Finally according to word content and text point, the similar journey of the word of two segments is determined
Degree.
Wherein, the picture similarity degree between two segments is determined by template matches and coefficient correlation method, specifically,
Template matches refer to just the position that target template is found in a frame image and the place that template is most like is exactly target.As long as
All subregions and target template of full figure are compared, the subregion for being most like target template is found, it is exactly the position of target
It sets.Again the similarity degree of the two is weighed by calculating the related coefficient between target template and subregion.It is first in the present embodiment
First according to the pixel value of each pixel in two segments, the corresponding matrix of each segment is generated respectively.Further according to two segment homographies
Between Matrix correlation, determine the picture similarity degree between two segments.
Step 105, according to the similarity degree belonged between the same segment to cluster, it is determined whether there are in duplicate pages
Hold.
Specifically, it to belonging to the same segment to cluster, is determined by image similarity comparison and word similarity comparison
The similarity degree of segment further determines whether that there are duplicate pages contents.
Wherein, the picture similarity degree between two segments is determined by template matches and coefficient correlation method, specifically,
Template matches refer to just the position that target template is found in a frame image and the place that template is most like is exactly target.As long as
All subregions and target template of full figure are compared, the subregion for being most like target template is found, it is exactly the position of target
It sets.Again the similarity degree of the two is weighed by calculating the related coefficient between target template and subregion.
Further instruction, by optical character identification scanning technique (Optical Character Recognition,
OCR) the word content in identification region, Text region module are extracted by the feature to different sample Chinese characters, complete to know
Not.
As a kind of situation in the cards, to belonging to same two segments to cluster, if the picture phase between two segments
It is more than the first similarity like degree, and the word similarity degree of two segments is more than the second similarity, determines that there are duplicate pages contents.
The case where as alternatively possible realization, to belonging to same two segments to cluster, if the picture between two segments
Similarity is not more than the first similarity, alternatively, the word similarity degree of two segments is no more than in the second similarity, determination is not present
Duplicate pages content.
The duplicate pages content detection algorithm of the embodiment of the present invention is obtained by carrying out interface sectional drawing to the page to be detected
To the image of the page;According to the half-tone information of described image, cut zone is identified from described image;According to described point
It cuts region to be split described image, obtains multiple segments;According to the characteristics of image of the multiple segment, clustered, and
Similarity degree is determined to belonging to the same segment to cluster;According to the similarity degree belonged between the same segment to cluster, determination is
It is no that there are duplicate pages contents.Sectional drawing obtained image in interface is split by this method by image technique, to each
Segment is compared, and prepares object of reference without additional, you can completes the detection that content of pages repeats, solves artificial in the prior art
Detection needs additionally to prepare a large amount of objects of reference dependent on artificial experience and automation verification, and the higher technology of testing cost is asked
Topic, improves the efficiency of duplicate pages content detection.
For an embodiment in clear explanation, more specifically understands after obtaining image to be detected, how to realize image
Duplicate contents detect, and the method for present embodiments providing another duplicate pages content detection, Fig. 2 is two institute of the embodiment of the present invention
A kind of flow diagram of the duplicate pages content detection algorithm provided.
As shown in Fig. 2, the duplicate pages content detection algorithm may comprise steps of:
Step 201, interface sectional drawing is carried out to the page to be detected, obtains the image of the page.
Specifically, the image of the page to be detected is to carry out what different shot operations obtained to the different pages to be detected.
As a kind of possible situation, sectional drawing, different behaviour are carried out to the page to be detected by mobile device for APP
Make system and sectional drawing is carried out using different operations, finally obtains the image of the page to be detected.
As alternatively possible situation, when being detected for the page of webpage, by various in computer equipment
Sectional drawing tool, sectional drawing softwares etc. carry out sectional drawing to the page to be detected, obtain the image of the page to be detected.
Step 202, it according to the half-tone information of image, determines the gray scale of each pixel, cut zone is identified from image,
And adjacent cut zone is merged.
Specifically, the half-tone information of image refers to the ash of each pixel in obtained image after figure is handled by gray processing
Angle value.Wherein, pixel is the minimum unit of image, and an image is made of many pixels, and each pixel array
Arrangement.Detection terminal determines the gray value of each pixel in image according to the half-tone information of image, then by identification module from figure
Cut zone is identified as in.
Further instruction identifies that the specific method of cut zone is from the image of the page to be detected:Along each picture
The row or column of vegetarian refreshments array scans for, and obtains the identical at least one-row pixels point of gray scale or an at least row pixel, and will search
At least one-row pixels point or an at least row pixel that rope arrives are as cut zone.And then search adjacent region whether have it is similar
Feature then adjacent cut zone is merged if there is similar feature.
Step 203, image is split according to cut zone, obtains multiple segments.
Specifically, image segmentation is that divide the image into several specific, with unique properties and propose mesh interested
Target technology and process.Current main image partition method has Threshold segmentation, region segmentation, edge segmentation etc..The present embodiment
In using the image partition method based on edge detection, basic ideas are the first edge pixels in determining image, then again
These pixels are linked together and just constitute required zone boundary.It solves to divide by detecting the edge comprising different zones
Problem, that is, detecting gray level or structure has the place of mutation, shows that the termination in region and another region start
Place, this discontinuity is known as edge.Different gradation of images is different, and boundary generally has apparent edge, utilizes this
Feature can divide image.
As a kind of mode in the cards, image to be detected can be split along the edge of the cut zone,
Obtain multiple segments.Specifically, the gray value of edge pixel is discontinuous in image, and this discontinuity can pass through derivation
It counts to detect.
As the mode of alternatively possible realization, cut-off rule can be determined inside cut zone, along cut-off rule, to be checked
Altimetric image is split, and obtains multiple segments.
Step 204, it determines the area accounting of each segment in the picture, and deletes the figure that area accounting is less than threshold value accounting
Block.
Specifically, it is split to image to be detected, after obtaining multiple segments, it is also necessary to determine each segment in the picture
Area accounting and delete area accounting be less than threshold value accounting segment.For example, deleting in the page height accounting less than percent 2
Or area accounting is less than 2/1000ths region.
Step 205, it according to the characteristics of image of multiple segments, is clustered, and phase is determined to belonging to the same segment to cluster
Like degree.
It should be noted that being split to image to be detected, after obtaining multiple segments, for each segment, first
Color space conversion is carried out, is converted from rgb space to HSV space, it is special then to carry out color to the transformed segment of color space
Sign extraction, finally using the color characteristic extracted and segment area accounting in the picture as characteristics of image.Wherein, from RGB
Space, which is converted to HSV space, to be realized by conversion formula, and conversion regime is consistent with conversion regime in embodiment one, this implementation
Example repeats no more.
Specifically, after carrying out color space conversion to segment, color feature extracted is carried out to transformed segment first.
It is directed to the color space of HSV, generally use color histogram carries out feature extraction, and matching process includes:Histogram intersection method,
Furthest Neighbor, centre-to-centre spacing method, reference color table method, cumulative color histogram method.
Further instruction carries out Clustering, and to belonging to according to the characteristics of image of multiple segments of extraction to segment
The same segment to cluster determines similarity degree.Specifically, it first to belonging to same two segments to cluster, determines between two segments
Picture similarity degree.If the picture similarity between two segments is more than the first similarity, then carries out word respectively to two segments
Identification, obtains word content and text point.Finally according to word content and text point, the similar journey of the word of two segments is determined
Degree.
Wherein, the picture similarity degree between two segments is determined by template matches and coefficient correlation method, specifically,
Template matches refer to just the position that target template is found in a frame image and the place that template is most like is exactly target.As long as
All subregions and target template of full figure are compared, the subregion for being most like target template is found, it is exactly the position of target
It sets.Again the similarity degree of the two is weighed by calculating the related coefficient between target template and subregion.It is first in the present embodiment
First according to the pixel value of each pixel in two segments, the corresponding matrix of each segment is generated respectively.Further according to two segment homographies
Between Matrix correlation, determine the picture similarity degree between two segments.
Step 206, by judging the picture similarity and word similarity of same two segments to cluster, it is determined whether exist
Duplicate pages content.
Specifically, belong to the same segment to cluster to pretreated, pass through image similarity comparison and word similarity
It compares to determine the similarity degree of segment, further determines whether that there are duplicate pages contents.
Wherein, the picture similarity degree between two segments is determined by template matches and coefficient correlation method, specifically,
Template matches refer to just the position that target template is found in a frame image and the place that template is most like is exactly target.As long as
All subregions and target template of full figure are compared, the subregion for being most like target template is found, it is exactly the position of target
It sets.Again the similarity degree of the two is weighed by calculating the related coefficient between target template and subregion.
Further instruction passes through the word content in optical character identification scanning technique identification region, Text region mould
Block is extracted by the feature to different sample Chinese characters, completes identification.
As a kind of situation in the cards, to belonging to same two segments to cluster, if the picture phase between two segments
It is more than the first similarity like degree, and the word similarity degree of two segments is more than the second similarity, determines that there are duplicate pages contents.
The case where as alternatively possible realization, to belonging to same two segments to cluster, if the picture between two segments
Similarity is not more than the first similarity, alternatively, the word similarity degree of two segments is no more than in the second similarity, determination is not present
Duplicate pages content.
As a kind of possible application scenarios, when executing method provided in this embodiment, duplicate pages content detection side
Method is identical, but detecting step, there may be difference, each step may merge execution, it is also possible to be split as an aforementioned step
More multi-step executes, and the present embodiment additionally provides method flow as shown in Figure 3, and Fig. 3 is provided to execute the embodiment of the present invention
Another duplicate pages content detection algorithm flow diagram.
As shown in figure 3, step 301 is first carried out, picture to be detected is inputted to detection terminal, wherein picture to be detected is
By carrying out what interface sectional drawing obtained to the page to be detected.
Secondly, step 302 is executed, page segmentation is carried out to picture to be detected.The step includes label cut zone, merges
Cut zone and determining cut-off rule.
Specifically, after picture to be detected is handled by gray processing, according to the half-tone information of image, the ash of each pixel is determined
Degree, then cut zone is identified from image by identification module, to search whether adjacent region has similar feature, such as
Fruit has similar feature, then is merged to adjacent cut zone.It finally can be along the edge of the cut zone to be checked
Altimetric image is split, and multiple segments is obtained, or cut-off rule is determined inside cut zone, along cut-off rule, to figure to be detected
As being split, multiple segments are obtained.
Further, step 303 is executed, the image after segmentation is pre-processed.The step includes to the figure after segmentation
As being filtered, extracting provincial characteristics and carrying out Clustering according to the feature of extraction.
Specifically, it is split to image to be detected, after obtaining multiple segments, it is also necessary to determine each segment in the picture
Area accounting and delete area accounting be less than threshold value accounting segment.Then it is directed to each segment, color space is carried out and turns
It changes, is converted from rgb space to HSV space, color feature extracted, the face that will be extracted are carried out to the transformed segment of color space
The area accounting of color characteristic and segment in the picture is as characteristics of image.Finally according to the characteristics of image of multiple segments of extraction,
Clustering is carried out to segment,
Then, step 304 is executed, comparison processing is carried out to pretreated picture.Finally execute step 305, output weight
Multiple region.
Specifically, comparison processing is carried out to pretreated picture, by belonging to same to pretreated and clustering
Segment carries out picture similarity comparison and word similarity comparison to determine the similarity degree of segment, further determines whether exist
Duplicate pages content.The repeat region of content of pages is exported eventually by detection terminal.
It is only briefly described above to executing detecting step shown in Fig. 3, specific detection method and this implementation of execution
Detection method is identical shown in example, no longer repeats herein.
The duplicate pages content detection algorithm of the embodiment of the present invention is obtained by carrying out interface sectional drawing to the page to be detected
To the image of the page;According to the half-tone information of image, the gray scale of each pixel is determined, cut zone is identified from image, and
Adjacent cut zone is merged;Image is split according to cut zone, obtains multiple segments;Determine that each segment exists
Area accounting in image, and delete the segment that area accounting is less than threshold value accounting;According to the characteristics of image of multiple segments, carry out
Cluster, and determine similarity degree to belonging to the same segment to cluster;By the picture similarity for judging same two segments to cluster
With word similarity, it is determined whether there are duplicate pages contents.This method is by image technique by the obtained figure of interface sectional drawing
Shape is split, and to which each segment be compared, prepares object of reference without additional, you can complete the inspection that content of pages repeats
It surveys, solves artificial detection in the prior art and need additionally to prepare a large amount of objects of reference dependent on artificial experience and automation verification,
The high technical problem of testing cost, improves the efficiency of duplicate pages content detection.
As shown in Figure 4 answer can be obtained by test in the duplicate contents detection method illustrated using above-described embodiment
With achievement, Fig. 4 is the exemplary plot of the duplicate pages content found using the duplicate contents detection method of the present invention.Using above-mentioned
Detection method, it can be found that three pictures are respectively present the content of repetition in Fig. 4, in two, the upper left corner rectangle frame of left side picture
Word content repeat, the phenomenon that there are image content repetitions in the picture of two, the right, there are exceptions.
In order to realize that above-described embodiment, the present invention also propose a kind of duplicate pages content detection device.
Fig. 5 is a kind of structural schematic diagram of duplicate pages content detection device provided in an embodiment of the present invention.
As shown in figure 5, the duplicate pages content detection device includes:Acquisition module 110, identification module 120, segmentation module
130, cluster module 140, detection module 150.
Acquisition module 110 obtains the image of the page for carrying out interface sectional drawing to the page to be detected.
Specifically, acquisition module 110 is by carrying out interface sectional drawing to the page to be detected, obtaining the image of the page, no
Mobile device with brand carries out sectional drawing using different operations, obtains picture to be detected, then picture to be detected is inputted and is detected
Terminal.
Identification module 120 identifies cut zone for the half-tone information according to described image from described image.
Specifically, after carrying out gray processing processing to picture to be detected, the gray value of each pixel in image is obtained.Along quilt
The row or column of each pixel array after gray processing scans for, and obtains the identical at least one-row pixels point of gray scale or at least one row
Pixel, and at least one-row pixels point searched or an at least row pixel are labeled as cut zone.Identification module 120
According to the gray value of each pixel, cut zone is identified from image.
Segmentation module 130 obtains multiple segments for being split to described image according to the cut zone.
Specifically, for identification module 120 after identifying cut zone in image, segmentation module 130 is split image
Processing, image segmentation be divide the image into several it is specific, with unique properties and propose the technology of interesting target and
Process.
Using the image partition method based on edge detection in the present embodiment, basic ideas are first in determining image
Then edge pixel again links together these pixels and just constitutes required zone boundary.Include different zones by detection
Edge solve segmentation problem, that is, detecting gray level or structure has the place of mutation, show the termination in a region,
It is the place that another region starts, this discontinuity is known as edge.Different gradation of images is different, and boundary generally has bright
Aobvious edge can divide image using this feature, obtain multiple segments.
Cluster module 140 is clustered, and cluster to belonging to same for the characteristics of image according to the multiple segment
Segment determine similarity degree.
Specifically, the characteristics of image of segment refers to being split to image to be detected, after obtaining multiple segments, for every
One segment carries out color space conversion, is converted from rgb space to HSV space, then to the transformed figure of color space first
Block carries out color feature extracted, finally that the area accounting of the color characteristic extracted and segment in the picture is special as image
Sign.Cluster module 140 carries out Clustering, same group of segment is with similar according to the characteristics of image of multiple segments to segment
Characteristics of image.
Further instruction, to belonging to same two segments to cluster, cluster module 140, it is first determined between two segments
Picture similarity degree.If the picture similarity between two segments is more than the first similarity, then carries out word respectively to two segments
Identification, obtains word content and text point.Finally according to word content and text point, the similar journey of the word of two segments is determined
Degree.
Detection module 150, for according to the similarity degree belonged between the same segment to cluster, it is determined whether there are repetitions
Content of pages.
Specifically, to belonging to the same segment to cluster, detection module 150 passes through image similarity comparison and word similarity
It compares to determine the similarity degree of segment, further determines whether that there are duplicate pages contents.
As a kind of situation in the cards, to belonging to same two segments to cluster, if the picture phase between two segments
It is more than the first similarity like degree, and the word similarity degree of two segments is more than the second similarity, determines that there are duplicate pages contents.
The case where as alternatively possible realization, to belonging to same two segments to cluster, if the picture between two segments
Similarity is not more than the first similarity, alternatively, the word similarity degree of two segments is no more than in the second similarity, determination is not present
Duplicate pages content.
The duplicate pages content detection device of the embodiment of the present invention is obtained by carrying out interface sectional drawing to the page to be detected
To the image of the page;According to the half-tone information of described image, cut zone is identified from described image;According to described point
It cuts region to be split described image, obtains multiple segments;According to the characteristics of image of the multiple segment, clustered, and
Similarity degree is determined to belonging to the same segment to cluster;According to the similarity degree belonged between the same segment to cluster, determination is
It is no that there are duplicate pages contents.Sectional drawing obtained image in interface is split by this method by image technique, to each
Segment is compared, and prepares object of reference without additional, you can completes the detection that content of pages repeats, solves artificial in the prior art
Detection needs additionally to prepare a large amount of objects of reference dependent on artificial experience and automation verification, and the higher technology of testing cost is asked
Topic, improves the efficiency of duplicate pages content detection.
It should be noted that aforementioned be also applied for the implementation to the explanation for repeating content of pages detection method embodiment
The duplicate pages content detection device of example, details are not described herein again.
In order to realize above-described embodiment, the present invention also proposes another computer equipment, including:Processor and for storing
The memory of the processor-executable instruction.
Wherein, the processor by read the executable program code stored in the memory run with it is described can
The corresponding program of program code is executed, for realizing the duplicate pages content detection side proposed such as present invention
Method.
In order to realize that above-described embodiment, the present invention also propose a kind of computer readable storage medium, it is stored thereon with calculating
Machine program, which is characterized in that the program is realized when being executed by processor in the duplicate pages that above-mentioned first aspect embodiment proposes
Hold detection method.
Fig. 6 shows the block diagram of the exemplary computer device suitable for being used for realizing the application embodiment.What Fig. 6 was shown
Computer equipment 12 is only an example, should not bring any restrictions to the function and use scope of the embodiment of the present application.
As shown in fig. 6, computer equipment 12 is showed in the form of universal computing device.The component of computer equipment 12 can be with
Including but not limited to:One or more processor or processing unit 16, system storage 28 connect different system component
The bus 18 of (including system storage 28 and processing unit 16).
Bus 18 indicates one or more in a few class bus structures, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures.It lifts
For example, these architectures include but not limited to industry standard architecture (Industry Standard
Architecture;Hereinafter referred to as:ISA) bus, microchannel architecture (Micro Channel Architecture;Below
Referred to as:MAC) bus, enhanced isa bus, Video Electronics Standards Association (Video Electronics Standards
Association;Hereinafter referred to as:VESA) local bus and peripheral component interconnection (Peripheral Component
Interconnection;Hereinafter referred to as:PCI) bus.
Computer equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by
The usable medium that computer equipment 12 accesses, including volatile and non-volatile media, moveable and immovable medium.
Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory
Device (Random Access Memory;Hereinafter referred to as:RAM) 30 and/or cache memory 32.Computer equipment 12 can be with
Further comprise other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example,
Storage system 34 can be used for reading and writing immovable, non-volatile magnetic media, and (Fig. 6 do not show, commonly referred to as " hard drive
Device ").Although being not shown in Fig. 6, can provide for being driven to the disk for moving non-volatile magnetic disk (such as " floppy disk ") read-write
Dynamic device, and to removable anonvolatile optical disk (such as:Compact disc read-only memory (Compact Disc Read Only
Memory;Hereinafter referred to as:CD-ROM), digital multi CD-ROM (Digital Video Disc Read Only
Memory;Hereinafter referred to as:DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving
Device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program production
Product, the program product have one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application
The function of embodiment.
Program/utility 40 with one group of (at least one) program module 42 can be stored in such as memory 28
In, such program module 42 include but not limited to operating system, one or more application program, other program modules and
Program data may include the realization of network environment in each or certain combination in these examples.Program module 42 is usual
Execute the function and/or method in embodiments described herein.
Computer equipment 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24
Deng) communication, the equipment interacted with the computer system/server 12 can be also enabled a user to one or more to be communicated, and/
Or with any equipment (example that the computer system/server 12 is communicated with one or more of the other computing device
Such as network interface card, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 22.Also, it calculates
Machine equipment 12 can also pass through network adapter 20 and one or more network (such as LAN (Local Area
Network;Hereinafter referred to as:LAN), wide area network (Wide Area Network;Hereinafter referred to as:WAN) and/or public network, example
Such as internet) communication.As shown, network adapter 20 is communicated by bus 18 with other modules of computer equipment 12.It answers
When understanding, although not shown in the drawings, other hardware and/or software module can not used in conjunction with computer equipment 12, including but not
It is limited to:Microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and
Data backup storage system etc..
Processing unit 16 is stored in program in system storage 28 by operation, to perform various functions application and
Data processing, such as realize the method referred in previous embodiment.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three
It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable
Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction
The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable
Medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or when necessary with it
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned
In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage
Or firmware is realized.Such as, if realized in another embodiment with hardware, following skill well known in the art can be used
Any one of art or their combination are realized:With for data-signal realize logic function logic gates from
Logic circuit is dissipated, the application-specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile
Journey gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries
Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium
In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also
That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould
The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above
The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the present invention
System, those skilled in the art can be changed above-described embodiment, change, replace and become within the scope of the invention
Type.
Claims (12)
1. a kind of duplicate pages content detection algorithm, which is characterized in that the described method comprises the following steps:
Interface sectional drawing is carried out to the page to be detected, obtains the image of the page;
According to the half-tone information of described image, cut zone is identified from described image;
Described image is split according to the cut zone, obtains multiple segments;
It according to the characteristics of image of the multiple segment, is clustered, and similarity degree is determined to belonging to the same segment to cluster;
According to the similarity degree belonged between the same segment to cluster, it is determined whether there are duplicate pages contents.
2. duplicate pages content detection algorithm according to claim 1, which is characterized in that the ash according to described image
Information is spent, cut zone is identified from described image, including:
According to the half-tone information, the gray scale of each pixel in described image is determined;Wherein, in described image, each pixel-matrix
Row arrangement;
In described image, the row or column along array scans for, and obtains the identical at least one-row pixels point of gray scale or at least one
Row pixel, and using at least one-row pixels point searched or an at least row pixel as cut zone.
3. duplicate pages content detection algorithm according to claim 2, which is characterized in that at least one will searched
After row pixel or at least a row pixel are as cut zone, further include:
Adjacent cut zone is merged.
4. duplicate pages content detection algorithm according to claim 2, which is characterized in that described according to the cut zone
Described image is split, multiple segments are obtained, including:
Described image is split along the edge of the cut zone, obtains multiple segments;
Alternatively, determining that cut-off rule is split described image along the cut-off rule, obtains more inside the cut zone
A segment.
5. according to claim 1-4 any one of them duplicate pages content detection algorithms, which is characterized in that described in the basis
Cut zone is split described image, after obtaining multiple segments, further includes:
Determine area accounting of each segment in described image;
Delete the segment that area accounting is less than threshold value accounting.
6. according to claim 1-4 any one of them duplicate pages content detection algorithms, which is characterized in that described in the basis
Cut zone is split described image, after obtaining multiple segments, further includes:
For each segment, color space conversion is carried out, is converted from rgb space to HSV space;
Color feature extracted is carried out to the transformed segment of color space;
Using the area accounting of the color characteristic extracted and the segment in the picture as characteristics of image.
7. according to claim 1-4 any one of them duplicate pages content detection algorithms, which is characterized in that described pair belongs to same
One segment to cluster determines similarity degree, including:
To belonging to same two segments to cluster, the picture similarity degree between two segment is determined;
If the picture similarity between two segment is more than the first similarity, Text region is carried out respectively to two segment,
Obtain word content and text point;
According to the word content and text point, the word similarity degree of two segment is determined.
8. duplicate pages content detection algorithm according to claim 7, which is characterized in that the basis belongs to same and clusters
Segment between similarity degree, it is determined whether there are duplicate pages contents, including:
To belonging to same two segments to cluster, if the picture similarity between two segment is more than the first similarity, and it is described
The word similarity degree of two segments is more than the second similarity, determines that there are duplicate pages contents;
If the picture similarity between two segment is not more than the first similarity, alternatively, the similar journey of word of two segment
Degree determines and duplicate pages content is not present no more than in the second similarity.
9. duplicate pages content detection algorithm according to claim 7, which is characterized in that determination two segment it
Between picture similarity degree, including:
According to the pixel value of each pixel in two segment, the corresponding matrix of each segment is generated respectively;
According to the Matrix correlation between the two segments homography, the similar journey of picture between two segment is determined
Degree.
10. a kind of duplicate pages content detection device, which is characterized in that described device includes:
Acquisition module obtains the image of the page for carrying out interface sectional drawing to the page to be detected;
Identification module identifies cut zone for the half-tone information according to described image from described image;
Segmentation module obtains multiple segments for being split to described image according to the cut zone;
Cluster module is clustered, and true to belonging to the same segment to cluster for the characteristics of image according to the multiple segment
Determine similarity degree;
Detection module, for according to the similarity degree belonged between the same segment to cluster, it is determined whether there are in duplicate pages
Hold.
11. a kind of computer equipment, which is characterized in that including:Memory, processor and storage on a memory and can handled
The computer program run on device when the processor executes described program, realizes the weight as described in any in claim 1-9
Multiple content of pages detection method.
12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The duplicate pages content detection algorithm as described in any in claim 1-9 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810545595.7A CN108764352B (en) | 2018-05-25 | 2018-05-25 | Method and device for detecting repeated page content |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810545595.7A CN108764352B (en) | 2018-05-25 | 2018-05-25 | Method and device for detecting repeated page content |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108764352A true CN108764352A (en) | 2018-11-06 |
CN108764352B CN108764352B (en) | 2022-09-27 |
Family
ID=64000956
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810545595.7A Active CN108764352B (en) | 2018-05-25 | 2018-05-25 | Method and device for detecting repeated page content |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108764352B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109615017A (en) * | 2018-12-21 | 2019-04-12 | 大连海事大学 | Consider the Stack Overflow replication problem detection method of more reference factors |
CN109670507A (en) * | 2018-11-27 | 2019-04-23 | 维沃移动通信有限公司 | Image processing method, device and mobile terminal |
CN109739752A (en) * | 2018-12-21 | 2019-05-10 | 北京城市网邻信息技术有限公司 | Built-in resource testing method, apparatus, electronic equipment and readable storage medium storing program for executing |
CN110147516A (en) * | 2019-04-15 | 2019-08-20 | 深圳壹账通智能科技有限公司 | The intelligent identification Method and relevant device of front-end code in Pages Design |
CN110532188A (en) * | 2019-08-30 | 2019-12-03 | 北京三快在线科技有限公司 | The method and apparatus of page presentation test |
CN110716778A (en) * | 2019-09-10 | 2020-01-21 | 阿里巴巴集团控股有限公司 | Application compatibility testing method, device and system |
WO2020177584A1 (en) * | 2019-03-01 | 2020-09-10 | 华为技术有限公司 | Graphic typesetting method and related device |
CN112527282A (en) * | 2020-12-18 | 2021-03-19 | 平安银行股份有限公司 | Front-end page checking method and device, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070025617A1 (en) * | 2005-06-09 | 2007-02-01 | Canon Kabushiki Kaisha | Image processing method and apparatus |
CN101145902A (en) * | 2007-08-17 | 2008-03-19 | 东南大学 | Fishing webpage detection method based on image processing |
US20080137954A1 (en) * | 2006-12-12 | 2008-06-12 | Yichuan Tang | Method And Apparatus For Identifying Regions Of Different Content In An Image |
CN101859309A (en) * | 2009-04-07 | 2010-10-13 | 慧科讯业有限公司 | System and method for identifying repeated text |
CN105022752A (en) * | 2014-04-29 | 2015-11-04 | 中国电信股份有限公司 | Image retrieval method and apparatus |
CN105404683A (en) * | 2015-11-30 | 2016-03-16 | 北大方正集团有限公司 | Format file processing method and apparatus |
CN105678814A (en) * | 2016-01-05 | 2016-06-15 | 武汉大学 | Method for detecting repetitive texture of building facade image in combination with phase correlation analysis |
CN106156749A (en) * | 2016-07-25 | 2016-11-23 | 福建星网锐捷安防科技有限公司 | Method for detecting human face based on selective search and device |
-
2018
- 2018-05-25 CN CN201810545595.7A patent/CN108764352B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070025617A1 (en) * | 2005-06-09 | 2007-02-01 | Canon Kabushiki Kaisha | Image processing method and apparatus |
US20080137954A1 (en) * | 2006-12-12 | 2008-06-12 | Yichuan Tang | Method And Apparatus For Identifying Regions Of Different Content In An Image |
CN101145902A (en) * | 2007-08-17 | 2008-03-19 | 东南大学 | Fishing webpage detection method based on image processing |
CN101859309A (en) * | 2009-04-07 | 2010-10-13 | 慧科讯业有限公司 | System and method for identifying repeated text |
CN105022752A (en) * | 2014-04-29 | 2015-11-04 | 中国电信股份有限公司 | Image retrieval method and apparatus |
CN105404683A (en) * | 2015-11-30 | 2016-03-16 | 北大方正集团有限公司 | Format file processing method and apparatus |
CN105678814A (en) * | 2016-01-05 | 2016-06-15 | 武汉大学 | Method for detecting repetitive texture of building facade image in combination with phase correlation analysis |
CN106156749A (en) * | 2016-07-25 | 2016-11-23 | 福建星网锐捷安防科技有限公司 | Method for detecting human face based on selective search and device |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670507A (en) * | 2018-11-27 | 2019-04-23 | 维沃移动通信有限公司 | Image processing method, device and mobile terminal |
CN109615017A (en) * | 2018-12-21 | 2019-04-12 | 大连海事大学 | Consider the Stack Overflow replication problem detection method of more reference factors |
CN109739752A (en) * | 2018-12-21 | 2019-05-10 | 北京城市网邻信息技术有限公司 | Built-in resource testing method, apparatus, electronic equipment and readable storage medium storing program for executing |
CN109739752B (en) * | 2018-12-21 | 2022-10-25 | 北京城市网邻信息技术有限公司 | Built-in resource testing method and device, electronic equipment and readable storage medium |
CN109615017B (en) * | 2018-12-21 | 2021-06-29 | 大连海事大学 | Stack Overflow repeated problem detection method considering multiple reference factors |
WO2020177584A1 (en) * | 2019-03-01 | 2020-09-10 | 华为技术有限公司 | Graphic typesetting method and related device |
US11790584B2 (en) | 2019-03-01 | 2023-10-17 | Huawei Technologies Co., Ltd. | Image and text typesetting method and related apparatus thereof |
CN110147516A (en) * | 2019-04-15 | 2019-08-20 | 深圳壹账通智能科技有限公司 | The intelligent identification Method and relevant device of front-end code in Pages Design |
CN110532188B (en) * | 2019-08-30 | 2021-06-29 | 北京三快在线科技有限公司 | Page display test method and device |
CN110532188A (en) * | 2019-08-30 | 2019-12-03 | 北京三快在线科技有限公司 | The method and apparatus of page presentation test |
CN110716778A (en) * | 2019-09-10 | 2020-01-21 | 阿里巴巴集团控股有限公司 | Application compatibility testing method, device and system |
CN110716778B (en) * | 2019-09-10 | 2023-09-26 | 创新先进技术有限公司 | Application compatibility testing method, device and system |
CN112527282A (en) * | 2020-12-18 | 2021-03-19 | 平安银行股份有限公司 | Front-end page checking method and device, electronic equipment and storage medium |
CN112527282B (en) * | 2020-12-18 | 2023-11-07 | 平安银行股份有限公司 | Front-end page verification method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108764352B (en) | 2022-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108764352A (en) | Duplicate pages content detection algorithm and device | |
US10817741B2 (en) | Word segmentation system, method and device | |
CN109657673B (en) | Image recognition method and terminal | |
CN107093172A (en) | character detecting method and system | |
EP2323069A2 (en) | Method, device and system for content based image categorization field | |
CN107292307B (en) | Automatic identification method and system for inverted Chinese character verification code | |
CN104899586A (en) | Method for recognizing character contents included in image and device thereof | |
CN108960382A (en) | A kind of colour barcode and its color calibration method | |
CN111738252B (en) | Text line detection method, device and computer system in image | |
CN111259891B (en) | Method, device, equipment and medium for identifying identity card in natural scene | |
Shafait et al. | Pixel-accurate representation and evaluation of page segmentation in document images | |
CN113569863B (en) | Document checking method, system, electronic equipment and storage medium | |
KR20200020305A (en) | Method and Apparatus for character recognition | |
CN109858570A (en) | Image classification method and system, computer equipment and medium | |
Dutta et al. | Multi-lingual text localization from camera captured images based on foreground homogenity analysis | |
CN112507923A (en) | Certificate copying detection method and device, electronic equipment and medium | |
CN103136536A (en) | System and method for detecting target and method for exacting image features | |
CN116012860B (en) | Teacher blackboard writing design level diagnosis method and device based on image recognition | |
Zhang et al. | Computational method for calligraphic style representation and classification | |
WO2023159771A1 (en) | Rpa and ai-based invoice processing method and apparatus, device, and medium | |
Lin et al. | Multilingual corpus construction based on printed and handwritten character separation | |
CN112861861B (en) | Method and device for recognizing nixie tube text and electronic equipment | |
CN103136524A (en) | Object detecting system and method capable of restraining detection result redundancy | |
JP2003087562A (en) | Image processor and image processing method | |
CN113807315A (en) | Method, device, equipment and medium for constructing recognition model of object to be recognized |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |