CN105302527A - Thread organization method - Google Patents
Thread organization method Download PDFInfo
- Publication number
- CN105302527A CN105302527A CN201510716958.5A CN201510716958A CN105302527A CN 105302527 A CN105302527 A CN 105302527A CN 201510716958 A CN201510716958 A CN 201510716958A CN 105302527 A CN105302527 A CN 105302527A
- Authority
- CN
- China
- Prior art keywords
- thread
- task
- recovery
- fault
- threads
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a thread organization method. The method comprises the steps of: A, performing operation monitoring on a next thread by a thread in a mounting point loop, checking whether a fault exists or not, if the fault exists, picking off a problematic thread from the mounting point loop, feeding the problematic thread into a thread recovery region, judging whether the fault does not occur for the first time or not, if the fault does not occur for the first time, marking a next thread task as a problematic task, no longer performing the problematic task, and if the fault occurs for the first time, directly performing the step B; B, judging whether a global task needs to be executed or not, if a recovery task exists, executing the recovery task to recover all threads in the thread recovery region, adding a task without a problematic task mark into a task reliving list, and if the recovery task does not exist, performing the step C; and C, judging whether a task required to be relived exists in the task reliving list or not, and if so, generating a new thread for executing the task. With the thread organization method, the operational stability of network crawler threads can be improved.
Description
Technical field
The present invention relates to computer calculate scheduling of resource field, particularly relate to computer network computing applied technical field.
Background technology
Web crawlers, is a kind of according to certain rule, automatically captures program or the script of web message.According to the purpose of design of deviser, web crawlers may be implemented as various forms, often the implementation of web crawlers all can adopt multithreading, and the design of its thread often has complicated loop computation, the necessity that these situations manage single thread operation conditions in all causing web crawlers to design.
On the other hand, computing machine thread, it is the minimum unit that program performs stream, by the base unit that system is independently dispatched and assigned, thread oneself does not have system resource, only have the requisite resource that is in operation a bit, but it can share whole resources that process has with other thread belonging to a process together.
The very important ingredient of of multithreading is daemon thread, and the difference of daemon thread and common thread is, once all common thread end of run all in a process, then no matter whether daemon thread is in operation, and process all can terminate to run.Daemon thread is usually used to the subsidiary function run common thread.
Stable nucleus is a kind of by daemon thread being built in active thread operation body interior, makes mutually to guard between thread, and then makes all threads in process, form a kind of address of concentrated operational mode.
The difference of the mode of stable nucleus and traditional process concurrent program is, the task work of daemon thread is built in common worker thread by stable nucleus, such one side avoids the unsupervised contradiction of daemon thread self-operating, decreases on the other hand and frequently switch the time delay caused between the concurrent lower thread of height.
As shown in Figure 1, stable nucleus is made up of 2 parts, mount point loop and thread recovery area.Mount point is the operation tagged object corresponding to each thread, containing the information that thread runs.Carry out the operations such as inspection all to be carried out as intermediary by mount point thread.
The structure of mount point loop as shown in Figure 2.Wherein 1,2,3,4,5,6,7,8 is the mount point that 8 threads are corresponding, and these mount points are logically in the precedence relationship of a chain type, and front and back connect composition loop.
Thread recovery area as shown in Figure 3 then can form with a chain type mount point queue, and it is only kept in the mount point gone wrong, if the mount point of thread 1,2,3 is placed into recovery area, the structure of recovery area then as shown in Figure 3.In normal course of operation, mount point hangs over through being everlasting between a loop and thread recovery area and moves, if thread fault is made a definite diagnosis, then finally can be closed in thread recovery area and reclaim.
Multi-threaded network reptile in operational process often due to the network information change and crawl the frequent transitions of strategy and be absorbed in some dangerous code and cause the phenomenon of " seemingly-dead ", this kind of situation often checks to waste time and energy and does not have versatility.
Summary of the invention
Given this, the object of the invention is to overcome multi-threaded network reptile in prior art in operational process often due to the network information change and crawl the frequent transitions of strategy and be absorbed in some dangerous code and cause the problems such as the phenomenon of " seemingly-dead ", a kind of sets of threads organization method is proposed, by a kind of stable nucleus technology with versatility to realize the lifting of the operation stability of web crawlers, the operation stability of web crawlers thread greatly can be improved.
In order to realize this object, the technical scheme that the present invention takes is as follows.
A kind of sets of threads organization method, described method comprises step:
A: on mount point loop, a thread carries out operational monitoring to its next thread, checks there is non-fault, if there is fault, problem thread sent into thread recovery area from taking from mount point loop, and judges whether right and wrong occur described fault first,
If wherein described fault belongs to non-and occurs first, be then problem task by next thread task flagging described, problem task no longer performs,
If described fault belongs to occur first, then directly enter step B;
B, judge whether to need to perform overall task,
If there is recovery task, perform recovery task, reclaim the thread in all thread recovery areas, and the task of no problem task flagging is added resurrection task list,
If without recovery task, enter step C;
C, judge to bring back to life in task list whether having and needing to bring back to life of task, if having, generate new thread and perform this task.
In addition, taking a step forward of described steps A comprises:
A01, startup multithreading crawler system;
The connection of A02, initialization multithreading reptile and database, and check reptile run needed for database whether effective;
A03, system loads shared resource;
A04, system generate multiple thread in thread pool, and each thread independent parsing goes out secondary inlet;
The secondary inlet oneself obtained is sent into public filter by A05, each thread, selects unduplicated secondary inlet;
A06, in thread pool, generate new thread secondary inlet is crawled and resolves, and by result stored in database.
Distributed consensus method is wherein utilized to determine the global calculation thread of the recovery task being responsible for next thread described.
And utilize public Bloom filter to select unduplicated secondary inlet.
By adopting sets of threads organization method of the present invention, utilize a kind of stable nucleus structure of the mutual communications fabric between thread, process is endorsed to realize realizing automatically-monitored to the operation of key job thread in process by construction of stable, substantially increases the operation stability of worker thread.Take the standardization calling interface that stable nucleus provides simultaneously, it also avoid the time and efforts spent by daemon thread of independent design key worker thread, improve software development efficiency.
Accompanying drawing explanation
Fig. 1 is the structural representation of stable nucleus.
Fig. 2 is the structural representation of the mount point loop in stable nucleus.
Fig. 3 is the structural representation of the thread recovery area in stable nucleus.
Fig. 4 is the schematic flow sheet of specific embodiment of the invention thread method for organizing.
Fig. 5 is the schematic flow sheet of specific embodiment of the invention thread method for organizing.
Fig. 6 a-6d is the mount point loop of stable nucleus in the specific embodiment of the invention and the schematic diagram of thread recovery area.
Embodiment
Below in conjunction with accompanying drawing, the present invention is elaborated.
The example embodiment that following discloses are detailed.But concrete structure disclosed herein and function detail are only the objects for describing example embodiment.
But should be appreciated that, the present invention is not limited to disclosed concrete example embodiment, but covers all modifications, equivalent and the alternative that fall within the scope of the disclosure.In the description to whole accompanying drawing, identical Reference numeral represents identical element.
Should be appreciated that, term "and/or" as used in this comprises one or morely relevant lists any of item and all combinations simultaneously.Should be appreciated that in addition, when parts or unit are called as " connection " or " coupling " to another parts or unit, it can be directly connected or coupled to miscellaneous part or unit, or also can there is intermediate member or unit.In addition, other words being used for describing relation between parts or unit should be understood according to identical mode (such as, " between " to " directly ", " adjacent " to " direct neighbor " etc.).
As illustrated in figures 4-5, the invention discloses a kind of sets of threads organization method, said method comprising the steps of:
A: on mount point loop, a thread carries out operational monitoring to its next thread, checks there is non-fault, if there is fault, problem thread sent into thread recovery area from taking from mount point loop, and judges whether right and wrong occur described fault first,
If wherein described fault belongs to non-and occurs first, be then problem task by next thread task flagging described, problem task no longer performs,
If described fault belongs to occur first, then directly enter step B;
B, judge whether to need to perform overall task,
If there is recovery task, perform recovery task, reclaim the thread in all thread recovery areas, and the task of no problem task flagging is added resurrection task list,
If without recovery task, enter step C;
C, judge to bring back to life in task list whether having and needing to bring back to life of task, if having, generate new thread and perform this task.
Because the present invention utilizes a kind of stable nucleus structure of the mutual communications fabric between thread, process is endorsed to realize realizing automatically-monitored to the operation of key job thread in process by construction of stable, substantially increases the operation stability of worker thread.Take the standardization calling interface that stable nucleus provides simultaneously, it also avoid the time and efforts spent by daemon thread of independent design key worker thread, improve software development efficiency.
Wherein, taking a step forward of described steps A comprises:
A01, startup multithreading crawler system;
The connection of A02, initialization multithreading reptile and database, and check reptile run needed for database whether effective;
A03, system loads shared resource;
A04, system generate multiple thread in thread pool, and each thread independent parsing goes out secondary inlet;
The secondary inlet oneself obtained is sent into public filter by A05, each thread, selects unduplicated secondary inlet;
A06, in thread pool, generate new thread secondary inlet is crawled and resolves, and by result stored in database.
Especially, distributed consensus method is utilized to determine the global calculation thread of the recovery task being responsible for next thread described.
In addition, public Bloom filter is utilized to select unduplicated secondary inlet.
Below by way of a concrete example, technique effect of the present invention is described.
In this example, portal page is set to the homepage of the model list of a mhkc of Baidu's mhkc by us:
http://tieba.baidu.com/f?kw=%E9%AD%85%E6%97%8F&ie=utf-8
By this page write into Databasce, and by its each feature of parsing template also write into Databasce, and generate the page info storage list of this page.Preparation before system starts just completes.
Step 1.1: start up system.
Step 1.2: systems inspection database, has portal page, page parsing template and page info storage list in database, system starts smoothly.
Step 1.3: the core framework of system initialization public Bloom filter and generation stable nucleus, system starts generating run reptile thread afterwards.
Step 1.4: generate a reptile thread A from thread pool, A loads example portal page and crawls from database, parses four secondary page links after climbing to the page:
Page α: http://tieba.baidu.com/p/3953810314
Page β: http://tieba.baidu.com/p/3969039572
Page γ: http://tieba.baidu.com/p/3969049452
Page δ: http://tieba.baidu.com/p/3969020668
Step 1.5: these four page links are admitted to public Bloom filter and filter, checks and finds that α, β and δ tri-pages did not crawl.
Step 1.6: generate from thread pool three independently thread B, C and D respectively these three pages are crawled, afterwards by the result that crawls stored in database.
In upper example, as shown in Figure 6 a, last operation phase of system will have 4 thread ABCD to run simultaneously, and now stable nucleus is made up of four threads, the following describes countermeasure when to there is a kind of fault when this stage:
Suppose that in this stage, A, C, D tri-threads all normally run, B thread resolves template to this content disappearance not corresponding countermeasure when the secondary page α distributed it resolves due to page part content disappearance, therefore cause thread B to run and be absorbed in endless loop, now B thread is absorbed in abnormal operating state, cannot be out of service, also outwards cannot send signal.
What now A thread just in time terminated its portal page crawls task, enters the B thread of steps A thread to its bottom and detects, and finds that B thread is absorbed in endless loop or blocked state.So B thread is sticked dead mark and take feeding recovery area from stable nucleus by A thread.Current stable nucleus state is that mount point loop contains A, C, D tri-threads, and thread recovery area contains B thread, as shown in Figure 6 b.
Because there has been the thread B needing to reclaim thread recovery area, thus create a recovery task of overall importance.This recovery task is comparatively consuming time, therefore needs A, C and D tri-healthy threads to carry out load distribution.Following period of time hypothesis thread A, C, D below all ran step B and judged whether that taking turns to oneself carried out global calculation, now select A, C two threads by the distribution of computation tasks algorithm of stable nucleus not need to carry out global calculation, and D thread needs to carry out global calculation, therefore the recovery task of B thread performs global calculation by D thread.
D thread enters step C and performs recovery task, is checked through B thread in thread recovery area still in operation, so B thread is closed by D thread by force, makes B thread discharge its resource, and delete from system.In addition, D does when thread reclaims inspection and finds that the secondary page α crawled required for B thread does not still crawl, and this task performs first, makes mistakes to be caused by external factor, therefore generates new E thread and again crawl secondary page α.Now current stable nucleus state is that mount point loop contains A, C, D, E tetra-threads, and thread recovery area is empty, as fig. 6 c.
Because B thread operation troubles is that thus E still can cause thread E to occur the operation troubles the same with B because of content disappearance after downloading secondary page α because the parsing template of secondary page α lacks the processing scheme deficiency of this specific question to content.E thread repeats the same situation of B thread afterwards, is admitted to thread recovery area, but difference is to find when steps A performed by E thread to be that second time performs and makes mistakes to secondary page α, therefore no longer generates secondary page α and crawls task.Now current stable nucleus state is that mount point loop contains A, C, D tri-threads, and thread recovery area is empty, as shown in fig 6d.
It should be noted that; above-mentioned embodiment is only the present invention's preferably embodiment; can not limiting the scope of the invention be understood as, not depart under concept thereof of the present invention, all protection scope of the present invention is belonged to modification to any subtle change that the present invention does.
Claims (4)
1. a sets of threads organization method, described method comprises step:
A: on mount point loop, a thread carries out operational monitoring to its next thread, checks there is non-fault, if there is fault, problem thread sent into thread recovery area from taking from mount point loop, and judges whether right and wrong occur described fault first,
If wherein described fault belongs to non-and occurs first, be then problem task by next thread task flagging described, problem task no longer performs,
If described fault belongs to occur first, then directly enter step B;
B, judge whether to need to perform overall task,
If there is recovery task, perform recovery task, reclaim the thread in all thread recovery areas, and the task of no problem task flagging is added resurrection task list,
If without recovery task, enter step C;
C, judge to bring back to life in task list whether having and needing to bring back to life of task, if having, generate new thread and perform this task.
2. according to the sets of threads organization method described in claim 1, it is characterized in that, taking a step forward of described steps A comprises:
A01, startup multithreading crawler system;
The connection of A02, initialization multithreading reptile and database, and check reptile run needed for database whether effective;
A03, system loads shared resource;
A04, system generate multiple thread in thread pool, and each thread independent parsing goes out secondary inlet;
The secondary inlet oneself obtained is sent into public filter by A05, each thread, selects unduplicated secondary inlet;
A06, in thread pool, generate new thread secondary inlet is crawled and resolves, and by result stored in database.
3. according to the sets of threads organization method described in claim 1, it is characterized in that, utilize distributed consensus method to determine the global calculation thread of the recovery task being responsible for next thread described.
4. according to the sets of threads organization method described in claim 2, it is characterized in that, utilize public Bloom filter to select unduplicated secondary inlet.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510716958.5A CN105302527B (en) | 2015-10-29 | 2015-10-29 | Thread method for organizing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510716958.5A CN105302527B (en) | 2015-10-29 | 2015-10-29 | Thread method for organizing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105302527A true CN105302527A (en) | 2016-02-03 |
CN105302527B CN105302527B (en) | 2018-01-19 |
Family
ID=55199831
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510716958.5A Expired - Fee Related CN105302527B (en) | 2015-10-29 | 2015-10-29 | Thread method for organizing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105302527B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108096837A (en) * | 2016-11-25 | 2018-06-01 | 盛趣信息技术(上海)有限公司 | Game robot dynamic identifying method |
CN109542632A (en) * | 2018-11-30 | 2019-03-29 | 郑州云海信息技术有限公司 | A kind of method and device handling access request |
CN114546980A (en) * | 2022-04-25 | 2022-05-27 | 成都云祺科技有限公司 | Backup method, system and storage medium of NAS file system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030014471A1 (en) * | 2001-07-12 | 2003-01-16 | Nec Corporation | Multi-thread execution method and parallel processor system |
CN101446953A (en) * | 2008-11-25 | 2009-06-03 | 北京邮电大学 | Parallel associated notice board crawler system |
CN102375758A (en) * | 2010-08-20 | 2012-03-14 | 联芯科技有限公司 | Method and device for preventing apparent death of browser of mobile communication equipment |
US20130179730A1 (en) * | 2012-01-09 | 2013-07-11 | Samsung Electronics Co., Ltd. | Apparatus and method for fault recovery |
US20130332941A1 (en) * | 2012-06-08 | 2013-12-12 | Apple Inc. | Adaptive Process Importance |
CN103902386A (en) * | 2014-04-11 | 2014-07-02 | 复旦大学 | Multi-thread network crawler processing method based on connection proxy optimal management |
CN103902452A (en) * | 2014-04-01 | 2014-07-02 | 浙江大学 | Self-repair algorithm for software multi-point faults |
-
2015
- 2015-10-29 CN CN201510716958.5A patent/CN105302527B/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030014471A1 (en) * | 2001-07-12 | 2003-01-16 | Nec Corporation | Multi-thread execution method and parallel processor system |
CN101446953A (en) * | 2008-11-25 | 2009-06-03 | 北京邮电大学 | Parallel associated notice board crawler system |
CN102375758A (en) * | 2010-08-20 | 2012-03-14 | 联芯科技有限公司 | Method and device for preventing apparent death of browser of mobile communication equipment |
US20130179730A1 (en) * | 2012-01-09 | 2013-07-11 | Samsung Electronics Co., Ltd. | Apparatus and method for fault recovery |
US20130332941A1 (en) * | 2012-06-08 | 2013-12-12 | Apple Inc. | Adaptive Process Importance |
CN103902452A (en) * | 2014-04-01 | 2014-07-02 | 浙江大学 | Self-repair algorithm for software multi-point faults |
CN103902386A (en) * | 2014-04-11 | 2014-07-02 | 复旦大学 | Multi-thread network crawler processing method based on connection proxy optimal management |
Non-Patent Citations (2)
Title |
---|
DANIEL D等: "Blackboard and Multi-Agent Systems & the Future", 《COLLABORATING SOFTWARE》 * |
张超等: "多线程网络爬虫的设计与实现", 《电脑开发与应用》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108096837A (en) * | 2016-11-25 | 2018-06-01 | 盛趣信息技术(上海)有限公司 | Game robot dynamic identifying method |
CN109542632A (en) * | 2018-11-30 | 2019-03-29 | 郑州云海信息技术有限公司 | A kind of method and device handling access request |
CN114546980A (en) * | 2022-04-25 | 2022-05-27 | 成都云祺科技有限公司 | Backup method, system and storage medium of NAS file system |
CN114546980B (en) * | 2022-04-25 | 2022-07-08 | 成都云祺科技有限公司 | Backup method, system and storage medium of NAS file system |
Also Published As
Publication number | Publication date |
---|---|
CN105302527B (en) | 2018-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103152419B (en) | A kind of high availability cluster management method of cloud computing platform | |
CN103873279B (en) | Server management method and server management device | |
CN101996106B (en) | Method for monitoring software running state | |
US9342426B2 (en) | Distributed system, server computer, distributed management server, and failure prevention method | |
CN104461752A (en) | Two-level fault-tolerant multimedia distributed task processing method | |
CN107203552B (en) | Garbage recovery method and device | |
JP2005346331A (en) | Failure recovery apparatus, method for restoring fault, manager apparatus, and program | |
CN103618762A (en) | System and method for enterprise service bus state pretreatment based on AOP | |
CN101477386B (en) | Timer implementing method and apparatus | |
CN105302527A (en) | Thread organization method | |
CN103399787B (en) | A kind of MapReduce operation streaming dispatching method and dispatching patcher calculating platform based on Hadoop cloud | |
CN102662725A (en) | Event-driven high concurrent process virtual machine realization method | |
CN103713974A (en) | High-performance job scheduling management node dual-computer reinforcement method and device | |
CN108632106A (en) | System for monitoring service equipment | |
CN103197927B (en) | A kind of method that realizes of Workflow and system thereof | |
CN103309796A (en) | Monitoring method and device of component object model (COM) object | |
CN111176783A (en) | High-availability method and device for container treatment platform and electronic equipment | |
CN101373450A (en) | Method and system for processing CPU abnormity | |
SE500940C2 (en) | Methods and systems for dismantling a chain of linked processes in a distributed operating system | |
CN112637263A (en) | Multi-data center resource optimization promotion method and system and storage medium | |
CN117130730A (en) | Metadata management method for federal Kubernetes cluster | |
CN103019849B (en) | Virtual machine management method under cloud computing environment | |
CN101216802B (en) | Cross debugger conditional breakpoint accomplishing method | |
CN105119836B (en) | A kind of routing protocol component dynamic operation method based on state pool | |
CN107291589A (en) | Method for improving system reliability in robot operating system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180119 Termination date: 20181029 |
|
CF01 | Termination of patent right due to non-payment of annual fee |