PDA

Bekijk Volledige Versie : LeaseWeb - Network Outage - Post Mortum - 13-02-2004



LeaseWeb
13/02/04, 17:23
*****************************************
LeaseWeb BV
Network Outage
Post Mortum


Affected networks: TeleCity2 / Redbus
Timeframe: 12:11 til 13:30
Date: 13-02-2004
Downtime: 1:19 hours
*****************************************




*******************
History
*******************

12:11 Both our core routers in TeleCity2 shut down.
The network in TeleCity2 direct affects the Redbus network. The InterXion network is unaffected.

12:30 Engineers cannot solve problem remotely and rush to TeleCity2.

12:50 Engineers arrive in TeleCity2.

13:00 Engineers start troubleshooting.

13:10 Configuration is intact, hardware is intact, power is available, uplinks are available. Logs are being screened and captured.

13:24 Reload of routers.

13:30 Network online.



*******************
Problem
*******************

Both the Cisco 12000's in TeleCity2 shut down due to a memory overflow.
The available memory should normally be sufficient to handle 50% more routes than we currently get announced.

The 12000's are setup in failover configuration, but as R2 experienced the same probelems as R1, they both shut down.

We did not reload the routers immediately as this would not enable us to do proper troubleshooting.



*******************
Possible triggers
*******************

1. We have connected a large customer yesterday, which pushes 300Mbps extra traffic.
2. One of our carriers, Abovenet, had the exact same problem on one of it's routers (peering router) in Nikhef.
3. Security issues / bugs / exploits on this typical date



*******************
Preventive measures
*******************

Immediately

1. Different IOS version on R2 (completed)
2. Remote console connected via different uplink (in process)
3. Restrict available bandwidth of new customer to 250Mbps (completed)
4. Active monitoring of available memory on 12000 (completed)
5. Add more restrictions in router configs (completed)
6. Request feedback from Cisco (in process)


Short-term

1. Set up a new - fully separate - network in another location, to cope with the bandwidth growth (in process)
2. Streamline feedback to customers (in process)
3. Setup fault pages at InterXion (in process)


Mid/Long-term

1. Different vendor for 2nd router (in investigation)



*******************
Compensation
*******************

Customers who's NPT (Network Performance Targets) will not be met at month's end, shall get compensated according their contract.



*******************
Contact
*******************

We apologize for any inconvenience caused.
Please contact us at info@leaseweb.com or +31 30 2368696 should you have more questions regarding this issue.

wdv
13/02/04, 17:41
Bedankt voor het uitstekende rapport. Wordt deze compensatie automatisch gegeven?

Domenico
13/02/04, 17:55
Op verzoek gesloten.
Over dit soort zaken is LeaseWeb op de reguliere wegen te bereiken natuurlijk.