CN109829092A - The method that a kind of pair of webpage is oriented monitoring - Google Patents

The method that a kind of pair of webpage is oriented monitoring Download PDF

Info

Publication number
CN109829092A
CN109829092A CN201811604429.6A CN201811604429A CN109829092A CN 109829092 A CN109829092 A CN 109829092A CN 201811604429 A CN201811604429 A CN 201811604429A CN 109829092 A CN109829092 A CN 109829092A
Authority
CN
China
Prior art keywords
frame
content
user
webpage
choosing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811604429.6A
Other languages
Chinese (zh)
Other versions
CN109829092B (en
Inventor
孙再连
吴谋荣
苏淮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yitong Intelligent Technology Group Co ltd
Original Assignee
Xiamen Yitong Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yitong Software Technology Co Ltd filed Critical Xiamen Yitong Software Technology Co Ltd
Priority to CN201811604429.6A priority Critical patent/CN109829092B/en
Publication of CN109829092A publication Critical patent/CN109829092A/en
Application granted granted Critical
Publication of CN109829092B publication Critical patent/CN109829092B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses the methods that a kind of pair of webpage is oriented monitoring, carry out frame choosing to the content on webpage, each content that crawl frame is chosen provides the relevant information of each content, the relevant information includes title, abstract, network address, Web page text etc..The method that the application proposes directly acquires the relevant information that frame selects content by frame network selection page, it is simple and quick, and the content selected content identical with frame on webpage and do not chosen by frame can be obtained automatically, user's multiple frame choosing on webpage is avoided, user job efficiency is improved.The method can also log history frame selection operation, and judge that different frames select corresponding content whether consistent, when content is consistent, content and its relevant information that history frame selection operation crawls are provided, it avoids repeatedly crawling webpage, waste of resource, simultaneously, it is additionally added manual oversight mechanism and artificial judgment mechanism, improves the accuracy and reliability of this method.

Description

The method that a kind of pair of webpage is oriented monitoring
Technical field
The present invention relates to the methods that webpage monitoring technical field more particularly to a kind of pair of webpage are oriented monitoring.
Background technique
How quickly information explosion epoch, the data of magnanimity on current internet accurately obtain data and have become Personal and enterprise strong demand.
Reptile instrument and data acquisition product category at present on the market is various, some product use process simple, intuitives, But versatility, maintainability and accuracy, there are more problem, specific manifestation is as follows:
1, the strategy protocol that data source has been customized based on product itself is unable to complete the data customization of depth;
2, configuration process is extremely complex, to personnel qualifications height, needs professional that could complete;
3, specific analyzer is only capable of being extracted for the specific page, if to be directed to multiple and different dynamic websites Different columns extracted, must just write multiple analyzers, increase the complexity of system;
4, when certain features of target pages change, for example page link or page layout are modified, then are corresponded to Analyzer must also make corresponding modification, if the target pages being related to are too many or change too big, modify analyzer Difficulty will also increase.
Therefore, it based on above situation, needs a machine learning and user behavior track can be supervised currently on the market Survey technology combines, and versatility is high, maintainable good, and can make the extraction of text data more using natural language processing technique Simple and accurate webpage monitoring method.
Summary of the invention
In order to solve the above technical problems, providing the method that a kind of pair of webpage is oriented monitoring, feature exists the present invention In to the content progress frame choosing on webpage, each content that crawl frame is chosen provides the relevant information of each content, the phase Closing information includes title, abstract, network address, Web page text etc., simple and quick.
Optionally, the frame choosing uses screenshotss positioning method.
Optionally, location information is obtained according to subscriber frame favored area, then by the position of all elements in webpage and use The position of family frame content is compared, so that preliminary screening goes out matching content, what the matching content was wanted to know about by user Content.
Optionally, when may is that frame selects to the positioning method of frame favored area, record frame choosing initial point coordinate be (X1, Y1), end of record point coordinate is (X2, Y2), and starting point and ending point surrounds the frame favored area of rectangle, the seat of the frame favored area Mark is superimposed with caused by user pulls when webpage scroll bar and is displaced up and down, obtains the starting point and ending point of frame favored area Absolute coordinate, is (X1+ScrollLeft, Y1+ScrollTop) and (X2+ScrollLeft, Y2+ScrollTop) respectively, Wherein ScrollLeft is the value that webpage or so pulls, and ScrollTo is the value that webpage pulls up and down.
The coordinate for obtaining each content in webpage remembers that the coordinate of any one content A is (Xa, Ya), and the length of content A For W, width H.
Judge whether the element A on webpage is included in the region of subscriber frame choosing using exclusive method, as Xa+W < X1+ ScrollLeft perhaps X2+ScrollLeft < Xa perhaps Ya+H < Y1+ScrollTop or Y2+ScrollTop < Ya, Or Xa < X1+ScrollLeft and Ya < Y1+ScrollTop and Xa+W > X2+ScrollLeft and Ya+H > Y2+ When ScrollTop, it is judged as that content A not in the frame favored area, otherwise, is judged as that content A in the frame favored area, is weighed Multiple above-mentioned steps, can be obtained all the elements selected in webpage by frame.
Optionally, classification is carried out to web page source code by machine learning and labeling is handled;Pass through machine learning, simulation User's operation, intelligence crawl the trace simulation that the web page contents of interest to user carry out subscriber frame choosing to mark, lower brill, to webpage Depth is excavated, to crawl in webpage, content that is that the non-frame of user is chosen and being user's needs.
Optionally, all the elements in frame favored area are acquired, set B is formed, header element b1 and end member are obtained from set B Plain bn, analysis header element b1 and last element bn obtains their common father nodes, if the father of header element b1 and last element bn save Point level is different, then it is assumed that two elements are not the same types, reacquire element bn-1 after giving up last element bn, analysis is first The common parent of element b1 and element bn-1, and so on possess the element bm of common parent node with b1 until finding;Point Whether the pattern for analysing b1 and bm is identical, if the two pattern is different, casts out bm element and reacquires bm-1 element, reanalyse b1 With the pattern of bm-1 element, and so on possess the element b1 and bz of common pattern until finding;B1 element and bz are obtained respectively All father nodes of element are denoted as list1 and listz, compare the maximum same node point of list1 and listz level and are denoted as node1, Then node1 is the nearest common parent of b1 and bz element;Using node1 node, finds and be known as common pattern with b1 member Element obtains set Y={ b1 ... ..., bz }, and set Y is the content that user needs to obtain.
Optionally, by the frame favored area of the frame favored area or other users history of the real-time frame favored area of user and history Comparison, judges whether to belong to identical frame favored area;When be judged as belong to identical when, according to the frame of history select, obtain frame choosing The real-time relevant information of content;When be judged as be not belonging to identical when, crawl it is that the non-frame of user is chosen and be user need content With the relevant information of content.By the judgement of frame favored area, identical frame is selected, no longer needs to crawl webpage, reduction crawls secondary Number avoids repeating to crawl waste of resource.
Optionally, the determination method of the identical frame favored area is, by obtain subscriber frame choosing starting point coordinate and Terminating point coordinate setting (X1, Y1, X2, Y2) constructs SVM classifier as input parameter, according between starting point and terminating point Positioning of all the elements in entire webpage classify to frame favored area, carry out sentencing for same area further according to classification results It is fixed.
Optionally, user's supervision mechanism is added in the classifier, and user carries out classification results to judge whether it is user institute The content of concern, and judging result addition training set is trained next time;Periodically training set is cleaned and trained, it will The noise class generated due to user misoperation is merged, and correct judging result is finally saved in training set, using point Training result is called when class device, avoids the wasting of resources caused by repetition training.
Optionally, according to user identity feature constantly to user orient frame select behavior carry out machine learning, supervised learning, Intensified learning, to intelligently carry out automatic recommendation frame choosing to user's content of interest.The automatic recommendation frame choosing is to pass through Bayes classifier carries out the training of user behavior data sample classification, and when recommending frame choosing to meet user demand, classifier will be certainly Dynamic that user behavior and recommendation results are stored in data sample library, when recommending frame choosing not meet user demand, program will jump automatically It goes to the manual frame of user and selects interface, and select result to learn user behavior and frame simultaneously, to improve classifier to user The automatic accuracy for recommending frame choosing.
By the above-mentioned description of this invention it is found that compared to the prior art, the present invention has the advantage that
1, by frame network selection page, the relevant information that frame selects content is directly acquired, it is simple and quick;
2, the content selected content identical with frame on webpage and do not chosen by frame can be obtained automatically, avoid user in webpage Upper multiple frame choosing, improves user job efficiency;
3, it is able to record history frame selection operation, and judges that different frames select corresponding content whether consistent, when content is consistent, Content and its relevant information that history frame selection operation obtains are provided;
4, manual oversight mechanism and artificial judgment mechanism is added, improves the accuracy and reliability of this method.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes a part of the invention, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.
Wherein:
Fig. 1 is the flow diagram for the embodiment of the method one that a kind of pair of webpage of the invention is oriented monitoring;
Fig. 2 is the flow diagram for the embodiment of the method three that a kind of pair of webpage of the invention is oriented monitoring;
Fig. 3 is the flow diagram for the embodiment of the method four that a kind of pair of webpage of the invention is oriented monitoring;
Fig. 4 is the step schematic diagram for the embodiment of the method four that a kind of pair of webpage of the invention is oriented monitoring;
Fig. 5 is the step of a kind of pair of webpage of the invention is oriented five user's judgment mechanism of embodiment of the method for monitoring signal Figure.
Specific embodiment
In order to be clearer and more clear technical problems, technical solutions and advantages to be solved, tie below Drawings and examples are closed, the present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only used To explain the present invention, it is not intended to limit the present invention.
Embodiment one: referring to Fig. 1, the method that a kind of pair of webpage is oriented monitoring, uses the content on webpage and cut Shield positioning method and carry out frame choosing, each content that crawl frame is chosen provides the relevant information of each content, the relevant information packet Title, abstract, network address, Web page text etc. are included, it is simple and quick.
Before frame choosing, user is positioned to the targeted website of crawl, and in the whole picture of auto-building html files one, user carries out frame on picture Choosing carries out screenshot, selectes the data and content to be grabbed, i.e., picture recognition user left mouse button is clicked, mouse is mobile and mouse Key lifts three events, to obtain frame favored area, grabs user's expected data and content, the data of crawl are analyzed User is showed again.
Embodiment two, on the basis of example 1, to guarantee that the content obtained is content needed for user, according to user Frame favored area obtains location information, and then the position of all elements in webpage is compared with the position of subscriber frame content, To which preliminary screening goes out matching content, the content that the matching content is wanted to know about by user.
In the present embodiment, the positioning method of frame favored area may is that firstly, the mouse of record frame choosing is initial when frame selects Point coordinate is (X1, Y1), and end of record point coordinate is (X2, Y2), and starting point and ending point surrounds the frame favored area of rectangle, user It is possible to pull the scroll bar of browser device before screenshotss, so the coordinate of the frame favored area is superimposed with user and pulls webpage It is displaced up and down caused by when scroll bar, obtains the absolute coordinate of the starting point and ending point of frame favored area, be (X1 respectively + ScrollLeft, Y1+ScrollTop) and (X2+ScrollLeft, Y2+ScrollTop), wherein ScrollLeft is webpage The value that left and right pulls, ScrollTo are the value that webpage pulls up and down.
Then, the coordinate for obtaining each content in webpage remembers that the coordinate of any one content A is (Xa, Ya), and content A Length be W, width H.
Finally judge whether the element A on webpage is included in the region of subscriber frame choosing using exclusive method, as Xa+W < X1+ ScrollLeft perhaps X2+ScrollLeft < Xa perhaps Ya+H < Y1+ScrollTop or Y2+ScrollTop < Ya, Or Xa < X1+ScrollLeft and Ya < Y1+ScrollTop and Xa+W > X2+ScrollLeft and Ya+H > Y2+ When ScrollTop, it is judged as that content A not in the frame favored area, otherwise, is judged as content A in the frame favored area.
Above three step is repeated, can be obtained all the elements selected in webpage by frame.
Embodiment three since embodiment two is only that of obtaining the content chosen in webpage by subscriber frame, and can not obtain not It is chosen by frame, and is equally the content of user demand, so needing to find out the general character of content in frame favored area, then by webpage It is crawled, to obtain in webpage, all the elements needed for user.
In the present embodiment, classification is carried out to web page source code by machine learning and labeling is handled;
Referring to Fig. 2, particularly, passing through machine learning, analog subscriber using artificial intelligence analysis's user's operation behavior Operation, intelligence crawl the trace simulation that the web page contents of interest to user carry out subscriber frame choosing to mark, lower brill, to webpage depth It excavates, to obtain in webpage, content that is that the non-frame of user is chosen and being user's needs.
It is comprised the concrete steps that:
Firstly, all the elements in acquisition frame favored area, form set B, header element b1 and last element are obtained from set B Bn, analysis header element b1 and last element bn obtains their common father nodes, if the father node of header element b1 and last element bn Level is different, then it is assumed that two elements are not the same types, reacquire element bn-1 after giving up last element bn, analyze first member The common parent of plain b1 and element bn-1, and so on possess the element bm of common parent node with b1 until finding;
Then, whether the pattern for analyzing b1 and bm is identical, if the two pattern is different, casts out bm element and reacquires bm-1 Element reanalyses the pattern of b1 and bm-1 element, and so on possess the element b1 and bz of common pattern until finding;
Then all father nodes for obtaining b1 element and bz element respectively are denoted as list1 and listz, compare list1 and The maximum same node point of listz level is denoted as node1, then node1 is the nearest common parent of b1 and bz element;
Finally, using node1 node, find the element that common pattern is known as with b1 member, obtain set Y=b1 ... ..., Bz }, set Y is the content that user needs to obtain.
Example IV in embodiment three, when the content of webpage is more, can consume the regular hour to crawling for webpage, If crawling mode to each frame choosing all progress embodiments three, it is low to will lead to working efficiency.
Different frame selection operations, the content that frame is selected and the content wait crawl out may be identical, if can climb history The content taken is again supplied to user, and provides the real-time relevant information of corresponding contents, it will reduces web page crawl number, improves Working efficiency.
In the present embodiment, it is trained by the parameter and its corresponding content that obtain subscriber frame choosing, the parameter is frame The starting point coordinate and terminating point coordinate (X1, Y1, X2, Y2) of choosing, are input with the parameter, and corresponding frame selects content for output Classification building SVM classifier, to carry out the judgement of the same area according to classification results.
The frame favored area of the frame favored area or other users history of the real-time frame favored area of user and history is compared, is sentenced It is disconnected whether to belong to identical frame favored area;When be judged as belong to identical when, according to the frame of history select, obtain frame choosing content reality When relevant information;When be judged as be not belonging to identical when, crawl it is that the non-frame of user is chosen and be user need content and content Relevant information.
Such as: user's A yesterday is selected in microblogging webpage frame, has selected sport, cuisines, military three contents, system is to entire Webpage is crawled, and is crawled that the non-frame of netpage user is chosen and is the content that user needs, then provides above content Relevant information, such as the sport of yesterday, cuisines, the headline of military affairs, network address.User's B today selects microblogging webpage frame, though The starting point of right frame choosing is with terminal difference, but after determining, it is believed that the content in frame choosing be also sport, cuisines, it is three military in Hold, i.e., the frame favored area of user B is identical as the frame favored area of user A, at this point, no longer needing to crawl a netpage user again not Content that is that frame is chosen and being user's needs, then the web page crawl directly to user B recommended user A to user B as a result, recommend The real-time relevant information of result, such as the sport of today, cuisines, the headline of military affairs, network address are accordingly crawled, i.e., due to the date Difference, in identical frame favored area, the corresponding relevant information of identical content might have corresponding update.
In other embodiments, referring to Fig. 3, on the basis of example IV, user's supervision is added in the classifier Mechanism, user judge classification results, judge whether it is user's content of interest, and training set is added in judging result It is trained next time;Periodically training set is cleaned and trained, the noise class generated due to user misoperation is carried out Merge, correct judging result is finally saved in training set, training result is called when using classifier, repetition training is avoided to make At the wasting of resources.
Specific steps please refer to Fig. 4:
1, user carries out frame choosing, and gets input parameter (x1, y1, x2, y2).
2, parameter is inputted, is classified by the svm classifier that machine learning constructs, obtains classification results (template Favored area) and page presentation is selected in frame.
3, by judging whether the content of classification results is correct by user, if for the content that oneself is wanted, realize interaction And improve training set.
If 4, the content that user's judgment criteria frame favored area content is not intended to, the parameter that will acquire are set as new class, It is stored in database, and crawls web page contents, then result exports.
If the 5, content that user's judgment criteria frame favored area content is intended to, obtains classification corresponding content, then will The result output crawled avoids repeating to crawl, and the parameter is stored in database, improves training set data.
6, periodically training set data is cleaned, the noise class generated due to user misoperation (is actually pointed to Content repeats) it merges, improve classification accuracy.
7, periodically training set is trained, and saves training result, call training result when using classifier, avoid weight The wasting of resources caused by refreshment is practiced.
Embodiment five, in embodiment one, embodiment two on the basis of embodiment three or example IV, passes through user's Label classifies user, to realize that intelligent recommendation frame selects, the label includes industry, position, region, according to user's body Part feature constantly orients frame to user and behavior is selected to carry out machine learning, supervised learning, intensified learning, thus intelligently to user Content of interest carries out automatic recommendation frame choosing.The automatic recommendation frame choosing.The intelligent recommendation frame is additionally added user and judges machine System carries out the training of user behavior data sample classification by Bayes classifier, when recommending frame choosing to meet user demand, classification User behavior and recommendation results will be stored in data sample library automatically by device, and when recommending frame choosing not meet user demand, program will It automatic jumps to the manual frame of user and selects interface, and select result to learn user behavior and frame simultaneously, to improve classifier The accuracy for recommending frame to select automatically to user.
In the present embodiment, referring to Fig. 5, concrete operation step are as follows:
1, user enters frame page selection face, if you select No whether system prompt, which opens automatic recommendation frame choosing, then jumps to manually Frame choosing.
If 2, selection is to obtain the label of user as input parameter.
3, Bayes classifier is constructed, is classified according to the parameter of input, result is what the user paid close attention to each class Possibility probability takes the output as a result of maximum probability.
4, user's judgement is automatic recommends whether meet reality, if meeting, user and result as a result, and be stored in by output Database improves training sample.
If 5, not meeting, jump into manual frame and select interface, carries out manual frame choosing, and user and result are stored in data Training sample is improved in library.
In conclusion compared to the prior art, the method that a kind of pair of webpage that the application proposes is oriented monitoring passes through Frame network selection page directly acquires the relevant information that frame selects content, simple and quick, and can obtain automatically on webpage and in frame choosing Hold content that is identical and not chosen by frame, avoids user's multiple frame choosing on webpage, improve user job efficiency.The method Can also log history frame selection operation, and judge that different frames select corresponding content whether consistent, when content is consistent, provide history The content and its relevant information that frame selection operation obtains avoid repeatedly crawling webpage, waste of resource, meanwhile, it is additionally added manual oversight Mechanism and artificial judgment mechanism, improve the accuracy and reliability of this method.
The present invention is exemplarily described above in conjunction with attached drawing, it is clear that the present invention implements not by aforesaid way Limitation, as long as the improvement for the various unsubstantialities that the inventive concept and technical scheme of the present invention carry out is used, or without changing It is within the scope of the present invention into the conception and technical scheme of the invention are directly applied to other occasions.

Claims (10)

1. the method that a kind of pair of webpage is oriented monitoring, which is characterized in that carry out frame choosing, crawl frame choosing to the content on webpage In each content, provide the relevant information of each content, the relevant information includes title, abstract, network address, Web page text.
2. the method that a kind of pair of webpage according to claim 1 is oriented monitoring, which is characterized in that the frame choosing uses Screenshotss positioning method.
3. the method that a kind of pair of webpage according to claim 1 is oriented monitoring, which is characterized in that selected according to subscriber frame Region obtains location information, then selects the position of content to be compared with subscriber frame the position of all elements in webpage, from And preliminary screening goes out matching content, the content that the matching content is wanted to know about by user.
4. the method that a kind of pair of webpage according to claim 3 is oriented monitoring, which is characterized in that when frame selects, record The initial point coordinate of frame choosing is (X1, Y1), and end of record point coordinate is (X2, Y2), and starting point and ending point surrounds the frame of rectangle Favored area, the coordinate of the frame favored area are superimposed with caused by user pulls when webpage scroll bar and are displaced up and down, obtain frame The absolute coordinate of the starting point and ending point of favored area is (X1+ScrollLeft, Y1+ScrollTop) and (X2+ respectively ScrollLeft,Y2+ScrollTop);Obtain webpage in each content coordinate, remember any one content A coordinate be (Xa, Ya), and the length of content A is W, width H;Judge whether the element A on webpage is included in the area of subscriber frame choosing using exclusive method In domain, when Xa+W < X1+ScrollLeft perhaps X2+ScrollLeft < Xa perhaps Ya+H < Y1+ScrollTop or Y2+ScrollTop < Ya or Xa < X1+ScrollLeft and Ya < Y1+ScrollTop and Xa+W > X2+ScrollLeft And when Ya+H > Y2+ScrollTop, it is judged as that content A not in the frame favored area, otherwise, is judged as content A in the frame In favored area.
5. the method that a kind of pair of webpage according to claim 1 is oriented monitoring, which is characterized in that pass through machine learning Classification and labeling processing are carried out to web page source code;Mark, lower brill are crawled by machine learning, analog subscriber operation, intelligence The web page contents of interest to user carry out the trace simulation of subscriber frame choosing, to crawl that the non-frame of user is chosen and be that user needs The content wanted.
6. the method that a kind of pair of webpage according to claim 5 is oriented monitoring, which is characterized in that acquisition frame favored area Interior all the elements form set B, and header element b1 and last element bn is obtained from set B, analyze header element b1 and last element bn Their common father nodes are obtained, if the father node level of header element b1 and last element bn is different, then it is assumed that two elements are not It is the same type, reacquires element bn-1 after giving up last element bn, analyze the common parent section of header element b1 and element bn-1 Point, and so on possess the element bm of common parent node with b1 until finding;Whether the pattern for analyzing b1 and bm is identical, if The two pattern is different, casts out bm element and reacquires bm-1 element, reanalyses the pattern of b1 and bm-1 element, and so on it is straight To finding the element b1 and bz for possessing common pattern;Obtain respectively b1 element and bz element all father nodes be denoted as list1 and Listz compares the maximum same node point of list1 and listz level and is denoted as node1, then node1 is the nearest of b1 and bz element Common parent;Using node1 node, find the element that common pattern is known as with b1 member, obtain set Y=b1 ... ..., Bz }, set Y is the content that user needs to obtain.
7. the method that a kind of pair of webpage according to claim 5 is oriented monitoring, which is characterized in that user is real-time The frame favored area of the frame favored area or other users history of frame favored area and history compares, and judges whether to belong to identical frame choosing Region;When be judged as belong to identical when, according to the frame of history select, obtain frame choosing content real-time relevant information;When being judged as When being not belonging to identical, relevant information that is that the non-frame of user is chosen and being content and content that user needs is crawled.
8. the method that a kind of pair of webpage according to claim 7 is oriented monitoring, which is characterized in that the identical frame The determination method of favored area is, by the starting point coordinate and the terminating point coordinate setting (X1, Y1, X2, Y2) that obtain subscriber frame choosing SVM classifier is constructed as input parameter, according to the determining in entire webpage of all the elements between starting point and terminating point Position classifies to frame favored area, and the judgement of same area is carried out further according to classification results.
9. the method that a kind of pair of webpage according to claim 8 is oriented monitoring, which is characterized in that the classifier adds Access customer supervision mechanism, user carries out classification results to judge whether it is user's content of interest, and judging result is added Training set is trained next time;Periodically training set is cleaned and trained, the noise that will be generated due to user misoperation Class merges, and correct judging result is finally saved in training set, calls training result when using classifier, avoids repeating The wasting of resources caused by training.
10. the method that a kind of pair of webpage is oriented monitoring according to claim 1 or 5, which is characterized in that according to user Identity characteristic constantly to user orient frame select behavior carry out machine learning, supervised learning, intensified learning, thus intelligently to Family content of interest carries out automatic recommendation frame choosing.The automatic recommendation frame choosing is to carry out user behavior by Bayes classifier Data sample classification based training, when recommending frame choosing to meet user demand, classifier will automatically be deposited user behavior and recommendation results Enter data sample library, when recommending frame choosing not meet user demand, program will automatic jump to the manual frame of user and select interface, and same When select result to learn user behavior and frame, the accuracy for recommending to improve classifier to user frame to select automatically.
CN201811604429.6A 2018-12-26 2018-12-26 Method for directionally monitoring webpage Active CN109829092B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811604429.6A CN109829092B (en) 2018-12-26 2018-12-26 Method for directionally monitoring webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811604429.6A CN109829092B (en) 2018-12-26 2018-12-26 Method for directionally monitoring webpage

Publications (2)

Publication Number Publication Date
CN109829092A true CN109829092A (en) 2019-05-31
CN109829092B CN109829092B (en) 2021-05-28

Family

ID=66861232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811604429.6A Active CN109829092B (en) 2018-12-26 2018-12-26 Method for directionally monitoring webpage

Country Status (1)

Country Link
CN (1) CN109829092B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413499A (en) * 2019-07-30 2019-11-05 秒针信息技术有限公司 Information on services monitoring method, device, equipment and storage medium
CN112560403A (en) * 2019-09-26 2021-03-26 北京国双科技有限公司 Text processing method and device and electronic equipment
CN112579852A (en) * 2019-09-30 2021-03-30 厦门邑通软件科技有限公司 Interactive webpage data accurate acquisition method
CN113722640A (en) * 2021-08-26 2021-11-30 长沙博为软件技术股份有限公司 Method, device and medium for collecting webpage configurable items based on RPA
CN114025210A (en) * 2021-11-01 2022-02-08 深圳小湃科技有限公司 Popup shielding method, equipment, storage medium and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130106519A (en) * 2012-03-20 2013-09-30 삼성전자주식회사 Method and apparatus for managing history of web-browser
CN105138605A (en) * 2015-08-07 2015-12-09 苏州博优赞信息科技有限责任公司 User behavior real-time monitoring method based on webpage
US20160098178A1 (en) * 2011-08-30 2016-04-07 Adobe Systems Incorporated Identifying selected dynamic content regions
CN106294482A (en) * 2015-06-04 2017-01-04 阿里巴巴集团控股有限公司 The treating method and apparatus of webpage frame selection operation
CN106326316A (en) * 2015-07-08 2017-01-11 腾讯科技(深圳)有限公司 Web page advertisement filtering method and device
CN106897287A (en) * 2015-12-18 2017-06-27 中国电信股份有限公司 Homepage Publishing decimation in time method and the device for Homepage Publishing decimation in time
CN107609123A (en) * 2017-09-14 2018-01-19 西安领讯卓越信息技术有限公司 A kind of method presented based on news commending system polymerization news
CN107943812A (en) * 2017-05-24 2018-04-20 成都明途科技有限公司 Recommend method for the news of user's centralized integration resource
CN108279966A (en) * 2018-02-13 2018-07-13 广东欧珀移动通信有限公司 Webpage capture method, apparatus, terminal and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160098178A1 (en) * 2011-08-30 2016-04-07 Adobe Systems Incorporated Identifying selected dynamic content regions
KR20130106519A (en) * 2012-03-20 2013-09-30 삼성전자주식회사 Method and apparatus for managing history of web-browser
CN106294482A (en) * 2015-06-04 2017-01-04 阿里巴巴集团控股有限公司 The treating method and apparatus of webpage frame selection operation
CN106326316A (en) * 2015-07-08 2017-01-11 腾讯科技(深圳)有限公司 Web page advertisement filtering method and device
CN105138605A (en) * 2015-08-07 2015-12-09 苏州博优赞信息科技有限责任公司 User behavior real-time monitoring method based on webpage
CN106897287A (en) * 2015-12-18 2017-06-27 中国电信股份有限公司 Homepage Publishing decimation in time method and the device for Homepage Publishing decimation in time
CN107943812A (en) * 2017-05-24 2018-04-20 成都明途科技有限公司 Recommend method for the news of user's centralized integration resource
CN107609123A (en) * 2017-09-14 2018-01-19 西安领讯卓越信息技术有限公司 A kind of method presented based on news commending system polymerization news
CN108279966A (en) * 2018-02-13 2018-07-13 广东欧珀移动通信有限公司 Webpage capture method, apparatus, terminal and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNWANG: "A classification approach for less popular webpages based on latent semantic analysis and rough set model", 《EXPERT SYSTEMS WITH APPLICATIONS》 *
曹文俊: "面向推荐内容呈现位置的网页布局及区域权重研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413499A (en) * 2019-07-30 2019-11-05 秒针信息技术有限公司 Information on services monitoring method, device, equipment and storage medium
CN110413499B (en) * 2019-07-30 2023-12-19 秒针信息技术有限公司 Service information monitoring method, device, equipment and storage medium
CN112560403A (en) * 2019-09-26 2021-03-26 北京国双科技有限公司 Text processing method and device and electronic equipment
CN112579852A (en) * 2019-09-30 2021-03-30 厦门邑通软件科技有限公司 Interactive webpage data accurate acquisition method
CN112579852B (en) * 2019-09-30 2023-01-10 厦门邑通智能科技集团有限公司 Interactive webpage data accurate acquisition method
CN113722640A (en) * 2021-08-26 2021-11-30 长沙博为软件技术股份有限公司 Method, device and medium for collecting webpage configurable items based on RPA
CN114025210A (en) * 2021-11-01 2022-02-08 深圳小湃科技有限公司 Popup shielding method, equipment, storage medium and device
CN114025210B (en) * 2021-11-01 2023-02-28 深圳小湃科技有限公司 Popup shielding method, equipment, storage medium and device

Also Published As

Publication number Publication date
CN109829092B (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN109829092A (en) The method that a kind of pair of webpage is oriented monitoring
CN108874992A (en) The analysis of public opinion method, system, computer equipment and storage medium
US20060288015A1 (en) Electronic content classification
CN109815952A (en) Brand name recognition methods, computer installation and computer readable storage medium
JP2012529688A (en) Update notification method and system
CN102890692A (en) Webpage information extraction method and webpage information extraction system
CN109543925A (en) Risk Forecast Method, device, computer equipment and storage medium based on machine learning
CN107066548B (en) A kind of method that web page interlinkage is extracted in double dimension classification
US20200225927A1 (en) Methods and systems for automating computer application tasks using application guides, markups and computer vision
Dyvak et al. Recognition of Relevance of Web Resource Content Based on Analysis of Semantic Components
US8799791B2 (en) System for use in editorial review of stored information
Murthy XML URL classification based on their semantic structure orientation for web mining applications
CN109992723B (en) User interest tag construction method based on social network and related equipment
CN116777692A (en) Online learning method, device, equipment and storage medium based on data analysis
KR102532216B1 (en) Method for establishing ESG database with structured ESG data using ESG auxiliary tool and ESG service providing system performing the same
CN116595191A (en) Construction method and device of interactive low-code knowledge graph
US20220114516A1 (en) Systems and methods for discovery of automation opportunities
Grechanik et al. Differencing graphical user interfaces
CN114238581A (en) Intelligent retrieval system and method based on semantic understanding
CN111027319A (en) Method and device for analyzing natural language time words and computer equipment
KR20210080000A (en) User adaptive news service method and server based on deep learning
CN109214864A (en) A kind of advertisement recognition method and device, electronic equipment
CN116775813B (en) Service searching method, device, electronic equipment and readable storage medium
CN112749990B (en) Data analysis method and system based on tourist identity
Gao Intelligent Positioning Method of Paging Buttons Based on Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 361000 one of unit 702, No. 1, xishanwei Road, phase III Software Park, Xiamen Torch High tech Zone, Xiamen, Fujian Province

Patentee after: Xiamen Yitong Intelligent Technology Group Co.,Ltd.

Address before: B11, 4th floor, 1036 Xiahe Road, Siming District, Xiamen City, Fujian Province, 361000

Patentee before: XIAMEN ETOM SOFTWARE TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method for directional monitoring of web pages

Effective date of registration: 20220816

Granted publication date: 20210528

Pledgee: Xiamen Branch of PICC

Pledgor: Xiamen Yitong Intelligent Technology Group Co.,Ltd.

Registration number: Y2022980012793

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Granted publication date: 20210528

Pledgee: Xiamen Branch of PICC

Pledgor: Xiamen Yitong Intelligent Technology Group Co.,Ltd.

Registration number: Y2022980012793

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method for targeted monitoring of web pages

Granted publication date: 20210528

Pledgee: Agricultural Bank of China Limited Xiamen Lianqian Branch

Pledgor: Xiamen Yitong Intelligent Technology Group Co.,Ltd.

Registration number: Y2024980004722