CN103229535B

CN103229535B - The method and system recovered for unit in communication network

Info

Publication number: CN103229535B
Application number: CN201180055718.5A
Authority: CN
Inventors: M·R·加沃尔; M·阿布里斯
Original assignee: Alcatel Lucent SAS
Current assignee: Alcatel Lucent SAS
Priority date: 2010-11-19
Filing date: 2011-11-01
Publication date: 2016-10-19
Anticipated expiration: 2031-11-01
Also published as: WO2012067810A1; EP2641419A1; JP5710017B2; US20120131376A1; CN103229535A; JP2014504466A; KR20130086371A; KR101619078B1; EP2641419B1; US8730790B2

Abstract

Open a kind of method and system, assists in ensuring that and any unit collapses the individual unit that (accident behavior i.e. occurred as the result of software error or fault) is localized to support on the individual modem plate of multiple-unit configuration.Thus, chain of command and the residue unit configured in modem boards should keep operable.Additionally, operator should be able to select to use correct behavior (i.e. restart, reconfigure, delete or set up) for the unit in modem boards in the case of not affecting the operation of unit of other configurations.

Description

The method and system recovered for unit in communication network

Background technology

The exemplary embodiment of the present invention relates to the list in the multiple-unit configuration supporting to use polycaryon processor Unit recovery process in modem boards, wherein said polycaryon processor has and comprises all process Single SMP partition of device core.Present invention is particularly directed to field of wireless telecommunications, therefore by special to it Surely quote and describe, it is to be appreciated that the present invention is the most effective in other field and application.

As background technology, LTE(Long Term Evolution) it is intended to improve UMTS(universal mobile telecommunications System) mobile phone standard with solve future communications network demand quick evolution in 3GPP plan. LTE improves wireless network efficiency and bandwidth, reduces cost and promotes service experience.Specifically, LTE Use new spectrum opportunities and the more preferable fusion with other open standards is provided.LTE generally includes LTE RAN(radio access network) (being also known as E-UTRAN) and EPS(evolution packet system System, also known as evolution block core).

Communication system generally falls into two major functions: data surface function and chain of command function.Previously LTE product in, in modem boards use at least two processor: one support chain of command Function (in non real-time, such as, operates, manages and manages (or OA&M), and call process The function that management is relevant), another terminates and supports data surface function (in real time, such as, LTE Layer 2 processes).Control and data surface all uses different operating system (OS) examples, such as, be used for The Linux of chain of command and the real-time OS for data surface core, such as vXWorks(is by California My rice wind reaches wind river system system (Wind River Systems of Alameda, California) and makes And sell).Typically, a modem boards supports a sector or unit.So, in order to Support multiple-unit (such as 3 unit or 6 unit) configuration, it is necessary to provide same with number of unit The modem boards that sample is many.

As improvement, LTE wireless base station can use in modem boards polycaryon processor. In the case of this, such as, there is the operating system of the SMP Linux of PREEMPT RT patch Comprising a SMP(symmetric multi-processors of all 8 cores) on subregion.In this configuration, control Identical operating system is shared with data surface (real-time thread/process) in face processed (non-real-time threads/process) Example, even if they necessarily operate in different cores.

Owing to modem boards supports multiple unit and sector, so unit recovers to become complicated. Previously in scheme (i.e. use a modem boards to support each unit), software crash it After unit recovery process be simple.So, whole modem boards is restarted.Then, pass through Suitable initialized all hardware and component software restart modem boards, are ready for use at controller board The main OA&M-C of upper operation, to reconfigure unit in modem boards.

Restart modem boards with recover or rescue unit new architecture be no longer feasible option because The residue active unit making chain of command and modem support is also restarted by this.High availability is to close Service provider's demand of key, and must restart after software crash or again reconfigure only one All unit not acceptable option is destroyed during individual unit.

Accordingly, it would be desirable to a kind of method and system, assist in ensuring that and the collapse of any unit (is lacked as software The random operation that sunken or fault result occurs) it is localized to a unit.Thus, chain of command and In modem boards, the residue unit of configuration should keep operability.Additionally, operator should be able to select Select for the unit in modem boards take corrective action (i.e. restart, reconfigure, delete, Or set up), and do not affect the operation of the unit of other configurations.

Summary of the invention

Described herein is for being intervened by operator or automatically in software fault (or collapse) The most individually restart effective unit recovery process of (or recovery) discrete cell, and do not affect modulation Other active units on demodulator plate.

In one embodiment, it is provided that a kind of computer implemented method, supporting that there is polycaryon processor Multiple-unit configuration individual modem plate on provide unit Restoration Mechanism.Detection is derived from fault One or more defects of unit, and to one or more component software notification units in fault. Additionally, the resource that release is relevant to the unit in fault, and collect postmortem analysis data.And then, Be provided for the execution environment of the new unit of trustship, and operation on remote controllers plate, operation, Inform with management (OA&M) entity and can start new unit in modem boards.

In another embodiment, it is provided that a kind of system, for supporting to use the multiple-unit of polycaryon processor Unit is provided to recover on the individual modem plate of configuration.This system includes: modem boards； And polycaryon processor, including being connected to multiple processor cores of modem boards, wherein modulatedemodulate Device plate is adjusted to include kernel spacing and user's space.This system also includes the list being configured to perform each function Unit's Restoration Mechanism.These functions can include such as: unit one or more that detection is derived from fault Defect；To one or more component software notification units in fault；Discharge and the unit phase in fault The resource closed；Collect postmortem analysis data；It is provided for the execution environment of the new unit of trustship；With to far Operation, operation and management (OA&M) entity on range controller plate are informed in modem boards Upper energy starts new unit.

In another embodiment, the non-provisional computer data available carrier of storage instruction is being held by computer Computer performance element recovery process is made during row.Unit recovery process can include each function.Such as, Such function comprises the steps that the one or more defects detecting the unit being derived from fault；To one or Multiple component software notification units are in fault；Discharge the resource relevant to the unit in fault；Collect Postmortem analysis data；It is provided for the execution environment of the new unit of trustship；With on remote controllers plate Operation, operation and management (OA&M) entity are informed and can be started new unit in modem boards.

Other scopes of the application of the present invention are made apparent from according to detailed description of the invention presented below.So And, it is appreciated that the detailed description of the invention of instruction the preferred embodiment of the present invention is only given by diagram And particular instance, because within the spirit and scope of the present invention each changes and amendment is for this area Technical staff is made apparent from.

Accompanying drawing explanation

The present invention is stored in the structure of the various piece of equipment, deployment and combination, and the step of method In Zhou, the most here as more completely illustrated, point out ground, Yi Ji the most in the claims The purpose of design is obtained like that, in the accompanying drawings shown in accompanying drawing:

Fig. 1 illustrates an embodiment of platform architecture according to aspects of the present invention；

Fig. 2 illustrates the exemplary architecture realizing the core level of abstraction shown in Fig. 1；

Fig. 3 illustrates exemplary cell recovery process according to aspects of the present invention；With

Fig. 4 illustrates the example process of termination process.

Detailed description of the invention

Referring now to accompanying drawing, wherein this diagram is only used for illustrating exemplary embodiment, is not intended to limit The theme advocated, Fig. 1 provides can be in conjunction with the view of the system of presently described embodiment.

Referring now to Fig. 1, it is shown that exemplary platform framework 100.This framework is usually used for modulation /demodulation On device plate, it is to be appreciated that he can be used in other application.In this embodiment, wherein with complete 8, portion core defines a subregion.It can be appreciated, however, that polycaryon processor 100 can have arbitrarily Number core.By this embodiment, therefore, it is possible to use in whole cores (such as 8 cores) One symmetric multi-processors (SMP) operating system (OS) example 102 of upper operation.Due to chain of command With data surface under an OS example, it usually needs ensure that problematic data surface no longer destroys control Face.

In this example, polycaryon processor 100 serve 3 unit (be shown as 104 in the drawings, 106 and 108).Each unit need up-link (UL) scheduler (figure is shown as 110, 112 and 114) and downlink (DL) scheduler (being shown as 116,118 and 120 in Fig. 1).

Known use radio link controls (RLC) layer to be carried out segmentation, connect and revises across LTE empty The mistake in packet that middle interface sends and receives.At GPRS(2.5G) wireless stack uses wireless Current source road controls and media interviews control (RLC/MAC) software.There is provided and control at movement station and base station The data transmission of acknowledgement and non-acknowledgement between device processed (BSC).Therefore, framework 100 also includes RLC/MAC frame 122, this is the basis biography in the air interface used between the mobile station and the network Defeated unit.RLC/MAC frame 122 is generally used for carrying data and RLC/MAC signaling.

Polycaryon processor 100 also provides for operation, operation and management (OA&M) module 124 and calling Process (or CALLP) module 126.OA&M is generally used in description communication network operating, transporting Battalion, manage and safeguard process that assembly relates to, behavior, instrument, standard etc..CALLP module 126 Typically manage the non real-time aspect of call treatment behavior.

Additionally, polycaryon processor 100 includes core level of abstraction (CAL) 128, it is generally directed to layer 2 (L2) application software hides the specific details of core.Layer 2 is seven layers of open systems of computer network The data link layer of interconnection (OSI) model.Data link layer is adjacent networks joint in the wide area network The protocol layer of data is transmitted between node between point or in same local network section.Data link layer carries For transmitting function and the process apparatus of data between network entity, and detection can be provided and can revise The device of the mistake that can occur within the physical layer.The example of SDL be LAN (multinode), Peer-peer protocol (PPP), HDLC and ADCCP that connect for point-to-point (binode) Ethernet.In the case of this, L2 be commonly referred to as LTE air interface needed for L2 scheduler Processing, this can have the most intensive real-time requirement.

(it is responsible for processing mobile communication equipment and network exchange to meet the real-time performance demand of base station Business between system and signaling), the SMP such as with PREEMPT RT patch can be used The operating system of Linux.It is understood, of course, that, other operating systems can be used.In order to such SMP configuration realizes definitiveness behavior, it is preferable that to use core to retain and core affinity structure This system can be realized in the way of system action compared with asynchronous multiprocessing (AMP) by realization.This It is also expected to obtain optimum performance from the SMP Linux such as with PREEMPT RT.Multiple without lock zero The use of uniform business (such as buffer management and messenger service) can also contribute to solution can be by having Any delay issue that the use of the SMP Linux of PREEMPT RT operating system causes.

One of major function of core level of abstraction (128) as shown in Figure 1 is for senior application (such as L2 process) the various application of repertoire utilizing multi-core platform are provided.Therefore, by core level of abstraction It is designed as realizing some purposes.First, he supports BED(backboard Ethernet driver) interface, this Accelerate framework (DPAA) based on new data path, simultaneously for higher level application software (i.e. L2 software) hide DPAA and the specific scheme of multinuclear.(it is designed as DPAA optimizing many nucleoreticulums Network processes, and the load distribution of such as resource is with shared, including network interface and hardware accelerator.) its Secondary, he utilizes the DPAA nextport hardware component NextPort of P4080, is user face number in entrance and exit both direction According to providing the data path accelerated.3rd, he provides motility as much as possible, to be easily adaptable to Configuration change (i.e., it is not necessary to code changes).The example of CAL configuration is for buffer unit pool DPAA resource distribution, ingress frame queue and egress frame queue.

Commonly known system storage in the operating system of such as Linux can be divided into two not same districts Territory: kernel spacing and user's space.Kernel spacing is that kernel (i.e. the core of operating system) performs (i.e. Run) and its space serviced is provided.

Memorizer generally includes RAM(random access memory) unit, its content can be by high Speed is accessed (i.e. read and write), but is only temporarily held (i.e., in use or at most, When power supply is held open).Its object is to currently used in program and data.

User's space is depositing of running wherein of consumer process (any affairs outside the most unless the context) The set of memory location.Process is the execution example of program.One of role of kernel is that management is at this Each consumer process in space, and prevent them interfering with each other.

The use that kernel spacing can only be called by system is accessed by consumer process.System call be by Request in the operating system of the similar Unix that active process is initiated, the service performed for kernel, example As input/output (I/O) or process are set up.Active process is the current process of progress in CPU, It is opposed in CPU, wait the process that its next time is run.I/O be to or from CPU and to or from Ancillary equipment (such as disk drive, keyboard, mouse and printer) transmission data random procedure, Operation or equipment.

Therefore, kernel spacing is strictly reserved, and is used for running kernel, kernel extensions and great majority Device driver.On the contrary, user's space is the memory area of all user mode application work, and And this memorizer can swap out when necessary.

Referring now to Fig. 2, it is shown that realize the exemplary architecture 200 of these and other purposes.Thus, core Heart level of abstraction (CAL) 201 includes modules in the user space, includes but not limited to: core Level of abstraction initializes (CALInit) module 202, it is configured LTE network and any static state resolves, Classification and distribution (PCD) rule are loaded onto frame supervisor (FMan) 230 and 232, and based on joining CAL framework is set up in the set putting file；Core level of abstraction buffer (CALBuf) module 204； Core level of abstraction message (CALMsg) module 206, its to L2 software providing message service with to or Send from another plate (i.e. eCCM) and receive user face data；Core level of abstraction resolve, classification and Distribution (CALPcdFmc) 208, it provides by each FMan(230,232) PCD used Rule and be configured to route to ingress frame suitable core；Follow the tracks of with core level of abstraction DPAA (CALDpaaTrace) module 210, it provides following function for as kernel spacing module Core level of abstraction DPAA driver (CALDpaaDriver) 212 in enable and disable tracking.

Framework 200 also includes suitable operating system 214, such as, have PREEMPT RT patch SMP Linux.Subsequently, operating system 214 supports each driver, the most above-mentioned CALDPaa Driver 212, at least one frame supervisor (FMan) driver 216, at least one buffer tubes Reason device (BMan) driver 218 and at least one queue management device (QMan) driver 220.

As in figure 2 it is shown, framework 200 can suitably include that P4080CoreNet organizes 222, this is It is suitable for expansible network-on-chip and multiple power architecture process core is connected to cache, unit height Speed caching and the interconnection architecture of memory sub-system.

P4080 processor includes the realization of DPAA.Therefore, framework 200 may also include P4080 DPAA224.DPAA224 is designed to optimize multi-core network and processes, and such as, the load of resource divides Cloth is with shared, including network interface and hardware accelerator.As it can be seen, DPAA224 generally distinguishes Each including such as BMan226, QMan228 and the first and second Fman230 and 232 Individual manager.

It is known that in wireless many access communication systems, emitter and receptor can use multilamellar communication stack Communication.Described layer can include that such as physical layer, media interviews control (MAC) layer, radio link Control (RLC) layer, protocol layer (such as packet data convergence protocol (PDCP) layer), application Layer etc..Rlc layer receives service data unit (SDU) from PDCP layer, and connects or by SDU It is segmented into rlc protocol data cell (PDU), for being transmitted to MAC layer.

Thus, CALBuf module 204 promote for RLC SDU process in for L2 application Without lock buffer management service.As it is known in the art, non blocking algorithm guarantees the line for shared resource Cheng Jingzheng will not make their execution by mutual exclusion indefinite duration postpone.If secured system model The process enclosed, then non blocking algorithm is without lock (or exempting from lock).CALBuf module 204 also may be used Support inquiry (such as, pond consumption state, consumption notation, the Chi Ke for buffer unit pool statistical data With sexual state, pond assignment error counting etc.).CALBuf module 204 is generally and CALDpaaDriver 212 docking are to realize such service.CALBuf module 204 also provides for without lock buffer management side Case, this operates for the suitable system in multi-core environment is extreme crucial, wherein by Non real-time processing The lock used can cause delay issue for the processing in real time of release waiting this lock.

CALDpaaDriver212 is the kernel spacing assembly of CAL201, and use Bman and Qman API helps to realize and provide buffer management service and messenger service.Here, term API (or application programming interface) refers to the interface realized by software program so that it is can be with other software Alternately.It is mutual that he promotes between different software procedures, be similar to user promote user and computer it Between mutual mode.API is realized by application, storehouse and operating system, to determine theirs Vocabulary and calling convention, and it is used for accessing their service.Can include the specification of routine, data structure, Object class and be used between consumer and the implementer of API communication agreement.

CALDpaaDriver212 be generally responsible for managing DPAA resource to be used (buffer unit pool and Frame queue) it is distributed for user face data；Via each file operation (such as open, close, I-o controls (ioctl)) provide user's space interface for initializing, delaying to other CAL modules Rush device management and messenger service；Perform kernel to map to user's space (K-U) buffer；There is provided DPAA buffer pool and receptor and emitter statistical data；And it is slow to realize being used for managing annular Rush the service of device.It should be noted that the L2 software queue that circular buffer generally represents CAL, and he Be used for store the frame descriptor (FD) pointing to specific L2 Downlink scheduler thread.CALMsg Module 206 provides API for L2, to extract buffer descriptor from ring.

All CAL assemblies described above are usually platform middleware (running in user's space), Except CALDpaaDriver212.CALDpaaDriver212 be in kernel spacing run determine Driver processed, and its be designed to realize and provide by the clothes needed for CAL user's space middleware Business-especially, depend on those services of P4080DPAA nextport hardware component NextPort.

CALBuf module 204 provides and is exclusively used in the buffer management clothes that " fast path " data process Business.CALBuf module 204 provides user's space API to L2 application.CALBuf module 204 with CALDpaaDriver212 cooperates, and sets up but by Bman226 for CALDpaaDriver212 The buffer of management provides zero duplication and without lock buffer management service.

As it has been described above, for the performance requirement meeting modem boards based on polycaryon processor, nothing Lock buffer management scheme is important, and its use includes a subregion of whole 8 cores and passes through PREEMPT RT runs SMP Linux.Not having such scheme, system can be prolonged through unlimited Spike late, this can damage whole system.

Now, we forward the feelings of in fault or the fault of a unit in modem boards to Shape.About exemplary embodiment, as broad as long between unit and the failed unit in fault. In the case of any one, final result is to discharge the resource that such unit obtains.Thus, can realize Unit recovery process.One of primary demand for unit recovery process is that the recovery of unit should shadow Ring the function of other unit in the identical modem boards of unit in trustship fault, behavior and Performance.Additionally, unit recovers not introduce other work can having a strong impact in modem boards The unstability of moving cell or shortage of resources.

Turning now to Fig. 3, exemplary cell recovery process is shown in flow charts.Unit set forth below The basic function of recovery process:

Detection is derived from fault one or more defects of the unit of (or failed), and to by shadow The component software rung sends suitable message (310)；

The resource (320) that release (or removing) is relevant to the unit in fault；

Collect the postmortem analysis data (330) relevant to the unit in fault；With

Foundation can the execution environment of the new unit of trustship the OA&M-C on remote controllers plate Notice can start new unit (340) in modem boards.

It is understood that exemplary cell recovery process is frequently not individual software component.On the contrary, unit is extensive Multiple process suitably includes be bound in other component softwares of the many relevant to modem boards one Group integrating process.Unit recovery process applies various rule about all component softwares of administrative unit resource Then.Specifically, modulation and demodulation software assembly should follow each common design and coding rule.

First, unit recovery process should be able to participate in or detect under they are responsible for any other are soft Part and/or the fault of nextport hardware component NextPort.Mistake can be predicted by periodicity extraction state and statistics.Equally, Unit recovery process should be in predetermined time period detection fault (can use software and hardware house dog). The health of any run entity should be monitored.Additionally, for unit distribution or activation system resource the softest Part assembly is necessarily these resources and provides reset procedure.

Suitably, framework 200 also includes " trigger " process, and this is to initiate all software processes (bag Include layer 2 scheduler handle) middleware component.It should be noted that enode b (eNodeB) is supported LTE air interface and perform in fault/provided for radio resources management behavior that failed unit is relevant. Thus, LTE eNodeB application is one group of cooperating process.These process need by individually, can The process leaned on starts and monitors.This supervisory process is referred to as trigger.Trigger is unit and data Drive, receive via a group profile unlatching, monitor and recover the data needed for application process. Trigger is started by Linux init process (via opening script), and subsequent start-up constitutes application Every other process.By design, trigger the most only uses Linux to service.Trigger process It it is the parent process of all application processes.Trigger generally " is intercepted " regularly and is terminated event (bag with process Include layer 2 process to terminate) relevant information.

When the code fault of layer 2 process relevant to unit, (that is, the layer 2 not controlled by code enters Terminating unintentionally of journey), before it terminates, still there are some behaviors occurred in the environment of process. The environment of the Linux the most typically thread in fault is initiated signal handle, and exits by LEC The handle that (Linux error collection device) and other storehouses are set up.Noting, some exits handle and is registered in LEC is the most called with gather information when process terminates subsequently.Linux uses signal handle to send Signalisation.Linux calls " closedown " of each driver opened by the process in fault pellucidly Function.Specifically, he calls fast path driver CALDpaaDriver212.This is at kernel The chance of clearing cell related resource under pattern.If be configured at layer 2 process termination, then Linux Generate Core Dump.

Fig. 4 illustrates the sequence list of process termination phase.Originally, operating system 102 starts (for answering With code transparent ground) error collection device handle, to collect the data (410) for error collection snapshot. Operating system 214 subsequent start-up (for application code pellucidly) code tracking handle, to collect generation Code tracking daily record (420).Then, operating system 214 is called and is currently opened by the process in fault " closedown " function (430) of each kernel driver module, to perform by kernel driver module Any cost for the course allocation in fault is removed.Finally, operating system 214 is to " trigger " Notify about the process of termination (440).Alternatively, trigger also can receive termination event, including terminating State.Then, trigger is notice and the resource reset procedure that system scope is triggered in failed community.

Polycaryon processor also supports " process monitors ", and this is also middleware component.Process monitors " issuing/customization " service is used to issue state and the availability of monitored program.Such as, may be used Via Linux message queue, process monitors is connected to trigger.Process monitors is that trigger is entered The subprocess of journey.Here, term " subprocess " refers to be set up by another process (parent process) Process.Each process can set up many subprocess, but will only have a parent process, except first Process, it does not have parent process.First process (being referred to as init in Linux) is being drawn by kernel Time of leading opens, and never terminates.

When trigger receives process termination event (such as SIGCHLD Linux signal), he This event is sent to process monitors via message queue.Here, SIGCHLD refers to enter at son The signal sent to process when journey terminates.Using issue/customization procedure, process monitors is issued layer 2 and is entered The final state of journey, as " state changes instruction " for this process.

When it initializes, the local OA&M module 124 in modem boards customizes such as to enter Layer 2 process " state changes instruction " that journey monitor is issued.In order to customize, OA&M module 124 Should know the identity of all layer 2 processes, they are typically predefined form.Work as Message-based IPC When layer 2 process " state changes instruction " issued by device, OA&M module 124 receive termination message. When OA&M module 124 receives the instruction about the termination of layer 2 process, (excellent from shared memorizer Selection of land is realized by kernel module) extract the letter of reason about fault (such as dsp software collapse) Breath.OA&M module 124 runs on remote controllers plate to OA&M-C() send message, To stop the every other eNodeB behavior relevant to failed unit as early as possible.

If still the assembly in fault distributes such element resources after the fault of component software Proprietary rights, then the reset procedure that must activate correspondence is correlated with to relinquish possession and to discharge corresponding unit Resource.This demand mainly processes the application process allocation unit resource representing request in kernel spacing P4080 driver.At cell location device, self is registered in by L2 process in the user space CALMsg module 206.Subsequently, CALMsg module 206 is via CALDpaaDriver212 L2 course allocation resource (kernel, hardware) for request.

In modem boards, the most fatal and expendable software or hardware fault can to service Have an impact by property.As it was earlier mentioned, modem boards generally includes the many shared by multiple unit Resource.Therefore, the purpose of exemplary cell recovery process generally includes: closes related software and cancels Element resources, and do not affect operating unit；The unit can being again turned in fault and optimization clothes Business availability；And collect the purpose that the enough data about fault diagnose for postmortem analysis.

Owing to the software fault in the range of unit recovery process is corresponding to relevant to unit whole soft Losing of part assembly, so the telecommunication environment of correspondence can not be recovered by unit recovery process.Unit Recovery process cannot avoid the service disruption for the unit in fault inevitably, but its main purpose For minimizing this lasting event interrupted and scope.

It is important to note that, the arbitrarily collapse relating to chain of command process or thread the most typically affects tune All unit of configuration on modulator-demodulator plate.Therefore, fair termination is processed, is sent out by operator Go out unit reset command, L2 process or thread should realize before terminating self, perform its unit all money The logic of " removing " (or the release) in source.

Management in a number of ways is the P4080 system resource of unit distribution.Such as, some resources are from being The shared pool distribution of system scope.When distributing such resource (i.e. being had) by unit, they cannot By other component reuses, until they are clearly discharged.At the software having the resource from shared pool In the case of fault, platform software must represent owner's software and remove these resources.Otherwise, corresponding Resource will forever lose.In order to remove these resources, platform needs to identify which unit has them Mode.To this end, each personal resource from shared pool has the possessory relevant of appointment resource Label.Owner's software should update when each allocation unit resource and every time when shared pool discharges.

" fast path " is to have the more brachydactylia by program compared with " normal route " for description Make the term in the path of path.For effective fast path, it is necessary to more more effective than normal route Ground solves the task that great majority occur, and makes the latter process uncommon situation, and individual cases, at mistake Reason, and other are abnormal.Fast path is the form optimized.The entrance that fast path driver uses delays Rushing device pond is the example that system scope shares resource.They are exclusively used in reception user plane packets.

It should be noted that and distribute some other resources from privately owned pond.Resource from privately owned pond need not be with holding The owner of resource carrys out labelling.This is because, when distributing such resource, impliedly know all Person.Platform software only needs to know the state of unit；That is, unit whether resource is busy or resource is idle. It should be noted that and set up element resources when arranging execution environment, but need not Resources allocation.When configuration is patrolled Distribution is made when collecting unit.The output port buffer pond that fast path driver uses is that unit is the most private There is the example of resource.They are exclusively used in transmission user plane packets.Should be recovered by unit as quickly as possible Journey discharges resource that is that distribute and that had by the unit fault from shared pool.

When layer 2 process surprisingly terminates, its resource can be completed in the period in one of following two stage Removing, i.e. (1) detect the termination of process once operating system, or (2) are when as at the beginning of unit Beginningization and/or configuration a part produce layer 2 scheduler process new example time after.Without spy Determining problem, the removing for the element resources of layer 2 scheduler process should preferably occur in the layer of correspondence After 2 scheduler process terminate.

Can be for following reason clearing cell resource: (1) concedes space to pass through for residue unit Enough system resource is run；(2) if setting up the new example of unit, resource and memorizer are avoided Consumption.

Some stage allocation unit software resources in " life-span " of unit.Distribute when dispensing unit Some software resources.According to the behavior of unit, distribute continually and discharge some other resources.One Such example is communication buffer.In the case of Gai, about entering packet, kernel to user be empty Between (K-U) map buffer and obtained by suitable frame supervisor, and must be by the unit in fault Downlink scheduler discharges.When the software fault of processing unit, some communication buffers still by Component software in fault has.

Usually, the specific resource of all unit (such as, buffer, context, data structure, Interruption, thread, suspension DMA etc.) can be affected by cell failure.But, the side of management resource Formula is specific to application architecture, and the behavior of suitable operating system (such as Linux).Such as, exist Between user's space resource and kernel spacing resource and the privately owned pond of resource and resource shared pool it Between can be distinct.Similarly, from central entities (such as CALLP module 126 or OA&M Module 124) distribution compared to from unit special entity (such as layer 2 scheduler handle) point Can be distinct between joining.

The memorizer management some virtual storage addressings of management subsystem of the operating system of such as Linux Space a: address space is for each process, and an address space is for kernel.When process terminates Time, operating system is automatically releasable all resources from process address space distribution.Subsequently, it is not necessary to make The clearly removing of resource for a part for the process address space.Platform software driver (example should be passed through Such as CALDpaaDriver212) remove the list that the unit in representing fault distributes from kernel address space Unit's specific resources.It should be noted that the termination of process is the most automatically from kernel module clearing cell resource.

Term used herein " is removed " and is not only represented the storage buffer in kernel spacing, and It is also represented by other kernel resources many and the behavior specific to unit.Such kernel money is below described Some examples of source and behavior.

By example, if the result as unit collapse deletes the unit managed, then can need to prohibit Interrupt with some.

Equally, if (such as the result of unit collapse) deletes the unit managed, then can need Disable some DMA transfer.DMA represents direct memory access, and can represent in calculating system Equipment in system or other entities revise the ability of main-memory content in the case of without CPU.

Driver can use kernel thread to be used for housekeeping, statistics or other cycle behaviors.If example Result as collapsed as unit deletes the unit managed, then can disable these behaviors.

According to fast path, the configuration of DPAA and Linux requirement definition, first by data surface Entry data stream is assigned to corresponding P4080 core, is then assigned to be matched with Downlink scheduler line The correct of the buffer descriptor of journey enters choma.Specifically, for data stream fast path and DPAA configures and should at least meet following demand:

1. the logic IP addresses of definition unit in modem boards；

2. the frame queue relevant to the logic IP addresses of unit should be tied to the core phase with trustship unit The QMan entrance closed；With

3. the context of the frame queue relevant to unit should point to the buffer relevant with object element and describe Accord with enters choma.

Recover about unit, the previous configuration of the entry data stream for unit should be cancelled.In order to cancel This configuration, is considered as some behaviors.Such as, it is called when CALLP module 126 and notifies to close In the information of cell failure, then should be from the logic IP addresses of modem boards removal unit.Additionally, When notifying the information that " fast path " CALMsg module 206 terminates about layer 2 scheduler process, Mapping between ingress frame queue can be changed back to chain of command core from current data face core.

It should be noted that the CALMsg module 206 of Fig. 2 includes each API, comprise registration and cancellation is stepped on Cite sb. for meritorious service energy.Thus, the commonly provided some processes, for representing the element resources of layer 2 course allocation, are wrapped Include but be not limited to: (1) CALMsg registration function is (for each data stream type: entrance and going out Mouthful)；(2) CALMsg cancels registration function (for each data stream type: entrance and exit)； (3) driver " closes " process (it covers all data stream type).

The CALMsg registration function request fast path kernel module of BED interface is outlet and entrance Resource needed for data stream type foundation.CALMsg registration function also asks fast path kernel module Surplus resources is removed from the previous case of identical layer 2 scheduler process.Layer 2 scheduler process from CALLP module 126 receives the CALMsg registration merit calling BED interface when unit arranges request Energy.

It is for by layer by layer 2 scheduler process from the output port buffer that privately owned output port buffer pond is distributed The possible candidate that next example of 2 processes is purged.These resources correspond to outlet data stream type. One reason of the removing postponing these resources provides time transmission and release to suitable frame supervisor The demand of all packets (being queued up by L2 uplink scheduler before it collapses). The output port buffer of buffer-manager is a part for privately owned buffer unit pool, and resets buffer unit pool It it is relatively simple operation.Therefore, message and need not be is set when layer 2 scheduler process receives unit Set up layer 2 scheduler process constantly, carry out reseting buffer unit pool.

Except the point of penetration corresponding for CAL API specified by BED interface section, such as The fast path driver of CALDpaaDriver212 must provide " closedown " point of penetration.By applying Any driver called should provide open and close point of penetration.It is different from by CAL storehouse Other points of penetration exposed, are not exposed to layer 2 for " closedown " point of penetration of fast path module and adjust Degree device application.Layer 2 scheduler application should not directly invoke " closedown " for fast path driver Function.

" closedown " function for driver is commonly used for the standard Linux function of any driver. If process terminates, then Linux finds the All Drives opened by the process terminated and for all These drivers " close " point of penetration one by one." close " function and can determine that calling L2 scheduler enters The PID(process ID of journey) and the core number that is cited." closing " function can be by requesting party PID and corresponding unit ID associates." close " function and can remove the list being had by corresponding unit ID The ring of unit resource, such as buffer descriptor.

OA&M module 124 can be from FFS(flash memory file system) some core dump file in Determine which file is corresponding to the final state received from process monitors.So, he can manage and be used for The memory space of Core Dump is for postmortem analysis.Postmortem analysis data generally include Core Dump and Code tracking daily record, it can be used for the root cause analysis of unit collapse.OA&M module 124 is attempted really Guarantor complete resource in need remove (such as quote CALBuf API with release fault in unit Independent entry data buffer in using.Execution environment is set with can the new unit of trustship, i.e. Must arrange can before carrying out cell location by OA&M-C new unit need all softwares and Hardware resource.Therefore, the local OA&M module 124 in modem boards is to the most main OA&M-C notice can configure new unit on this plate again.

Restart modem boards to need a few minutes and affect all unit of plate trustship.The most single Unit recovers only need several seconds and the most only affect the unit in fault.Generally, collect for dividing afterwards The data of analysis are the longest steps of unit recovery process.The step for typically can not shorten because number Word signal processor (DSP) cannot be restarted during its memory dump.Other unit recovering step Duration need not to be critical, because they are at TTI(Transmission Time Interval) labelling (allusion quotation Type ground was less than 1 second) middle measurement.

For being exclusively used in the component software of unit, there are the various possibilities realizing unit recovery process.One Individual mode is the component software (relevant to the unit in fault) only stopped in fault and removes relevant Resource with make other assemblies (relevant to the unit in fault) and be rolled back to original state and do not stop him ?.Because the strong interdependency between the relative broad range of likelihood of failure and component software, this side Case is complicated and is not without a breath of air danger.Preferably scheme can be to stop with " strongly " Assembly, and restart the component software of all participations relevant to the unit in fault subsequently, even if only one Individual (suspection) breaks down.

It can be easily understood by those skilled in the art that the step of each said method can pass through programmed computer Perform.Here, some embodiments also aim to overlay program storage device, such as, machine or calculating The digital data storage medium that machine is readable, and the machine of coded command can perform or computer can perform Program, wherein said instruction performs the some or all of steps of approach described above.Program storage device Can be such as digital memory, magnetic storage medium (such as Disk and tape), hard disk drive, Or the most readable digital data storage medium.Embodiment also aims to covering and is programmed to execute above-mentioned side The computer of the described step of method.

Above description only provides the disclosure of only certain embodiments of the present invention, the purpose being not intended to limit. Thus, the present invention is not limited only to above-described embodiment.On the contrary, it can be appreciated that, those skilled in the art can It is susceptible to the alternative fallen within the scope of the present invention.

Claims

1. a system, provides multiple-unit configuration in the modem boards in communication network Unit recovers, and this system includes:

Modem boards；With

Polycaryon processor, it is connected to described modem boards, and wherein said polycaryon processor includes Multiple processor cores, the plurality of processor core configure under single symmetric multi-processors subregion and by Thering is provided the single operating example service of Compliance control face and N number of data surface, wherein N is described tune Configured element number on modulator-demodulator plate, wherein said polycaryon processor is configured at least perform Following functions:

One or more defects of the unit that detection is derived from fault；

Identify the unit in described fault and distribute to described fault from the system scope shared pool of resource In the system resource of unit；

The resource of the unit that release is assigned in described fault；

Perform in described fault in the case of not affecting described Compliance control face or other operating unit The recovery of unit；

The execution environment of the new unit of trustship it is provided for by following steps:

The logic IP addresses of the unit in described fault in configuration and described modem boards Relevant resource；

The frame queue relevant to the logic IP addresses of described new unit is tied to described in trustship The queue management device entrance that the core of new unit is correlated with is to set up packet transmission path；

The context of the described frame queue relevant to described new unit is pointed to and divides with incoming data surface The buffer descriptor that the object element of group is relevant enter choma；With

Operation, operation and management OA&M entity on remote controllers plate are informed at described tune New unit can be started on modulator-demodulator plate.

2. the system as claimed in claim 1, wherein unit recovers do not restarting or reseting described multinuclear Processor or realize in the case of restarting the single operating example that described control and data surface are shared.

3. system as claimed in claim 2, wherein said polycaryon processor is detecting cell failure Time be further configured to:

Notice call processing module is to extract the cellular logic IP ground removed from described modem boards Location；

Mapping between one or more ingress frame queues is changed back to chain of command from current data face core Core, in order to still process the incoming packet with particular core as target, until joining on the core Put new unit；

Disable the interruption of elementary layer 2 process in described fault；

Disable the direct memory access transmission relevant to the unit in described fault.

4. system as claimed in claim 2, wherein said polycaryon processor is detecting layer 2 process It is further configured to during termination:

Disable the interruption of elementary layer 2 process in described fault；

5. the system as described in claim 3 or 4, wherein said polycaryon processor is further configured For discharging allocated resource from the system scope shared pool of described resource by following steps, so that The unit described shared pool of return in described fault:

Perform extra statistics, in order to each personal resource in the system scope shared pool of communication buffer There is the possessory respective labels specifying described resource；

Update institute when in each allocation unit resource and discharging this element resources from described shared pool every time State owner's label.

6. the system as claimed in claim 1, wherein said polycaryon processor clear and definite calling platform software Driver, in order to when described operating system is only automatically releasable from the resource of process address space distribution, Release represents the unit specific resources that the unit in described fault distributes from kernel address space.

7. the system as claimed in claim 1, wherein exists to one or more component software notification units In fault, described component software includes the OA&M entity in described modem boards and described tune At least one of call processing module on modulator-demodulator plate.

8. the system as claimed in claim 1, wherein said polycaryon processor is further configured to lead to Cross following steps and collect post-flight data to perform basic reason analysis:

Start multiple error collection device handle, to collect the data for error collection snapshot；

Start multiple code tracking handle, to collect multiple code tracking daily record.

9. the system as claimed in claim 1, wherein said polycaryon processor is further configured to lead to Cross following steps and collect post-flight data to perform basic reason analysis:

Start multiple error collection device handle, to collect the data for Core Dump；

10. the system as claimed in claim 1, wherein said polycaryon processor is further configured to Stop the component software relevant to the unit in described fault, then restart and the unit in described fault Relevant described component software.

11. 1 kinds of computer implemented methods, carry in the modem boards in communication network Unit for multiple-unit configuration recovers, and the method includes:

Cause by the unit in fault by being connected to the polycaryon processor detection of described modem boards One or more defects, wherein said polycaryon processor includes multiple processor core, the plurality of Processor core configures and by providing Compliance control face and N number of data under single symmetric multi-processors subregion The single operating example service in face, wherein N is the configured unit in described modem boards Quantity；

The described resource of the unit that release is assigned in described fault；

The context of the described frame queue relevant to described new unit is pointed to and incoming data bread The relevant buffer descriptor of object element enter choma；And

12. methods as claimed in claim 11, wherein unit recovers do not restarting or reseting described many Core processor or realize in the case of restarting the single operating example that described control and data surface are shared.

13. methods as claimed in claim 12, farther include:

When cell failure being detected, notice call processing module is to retrieve from described modem boards The cellular logic IP address removed；

Disable the interruption of elementary layer 2 process in described fault；

14. methods as claimed in claim 12, farther include:

When detecting that layer 2 process terminates, notice call processing module is to retrieve from described modulation /demodulation The cellular logic IP address that device plate is removed；

Disable the interruption of elementary layer 2 process in described fault；

15. methods as claimed in claim 11, farther include by following steps from described resource System scope shared pool in discharge allocated resource, so that unit in described fault returns described Shared pool:

16. methods as claimed in claim 11, the clear and definite calling platform of wherein said polycaryon processor is soft Part driver, in order to when described operating system is only automatically releasable from the resource of process address space distribution, Release represents the unit specific resources that the unit in described fault distributes from kernel address space.

17. methods as claimed in claim 11, farther include to lead to one or more component softwares Know unit in fault, wherein said one or more described component softwares include described modem At least one of OA&M entity on plate and the call processing module in described modem boards.

18. methods as claimed in claim 11, farther include to be collected by following steps to count afterwards Execution basic reason analysis according to this:

19. methods as claimed in claim 11, farther include to be collected by following steps to count afterwards Execution basic reason analysis according to this: