Following exceptionally heavy and sustained rain on the night of 16-17 July a significant volume of water entered one of the University’s key communications and data facilities at the New Museums Site (the Central Network Hub, or CNH).
From 00:30 a member of Information Services staff was in attendance, coordinating initial efforts to protect key infrastructure. Supported by staff from Estates Management, efforts were initially made to pump out standing water that had already accumulated but by 02:30, with the storm intensity having increased, the decision was made to cut power to the room as it was no longer safe. As a result power supplies to the University’s router to Janet were cut, as were supplies to the Janet network’s regional distribution router for the East of England. This decision was not made lightly but as a result a significant amount of equipment was protected that would otherwise have been damaged had a short circuit occurred.
Staff from Information Services’ Networks team were on site from approximately 07:00 making an initial assessment. The basement area housing the CNH was without lighting and was flooded. Pumping continued with assistance from contractors from the New Museums building works.
At 08:30 the Director of Information Services contacted the Executive Director of Jisc Technologies who confirmed that the 10Gbps contingency link at the West Cambridge Data Centre (WCDC) was available for us. Whilst the network connection was available, other network equipment necessary to operate it was not in place. Work to bring online this contingency connection was started.
The contingency connection was delivered by Janet as a result of negotiations following water ingress to the CNH earlier in 2015. Whilst the University had an earlier backup connection this was at a much lower capacity. Work was already being undertaken to deliver a fully resilient, full capacity, 20Gbps connection from the WCDC but this could not be delivered until late summer 2015.
In addition to housing key network infrastructure the CNH also housed servers for a number of UIS run services, and a smaller number of servers operated by UIS on behalf of other departments. By 10:15 we were aware that a number of services hadn’t gracefully failed over to resilient systems in other data centres. Reconfiguration was initiated which started to bring services back online.
Shortly after midday the contingency connection from the WCDC was brought online after one of our Network Address Translation boxes was moved from the CNH to the WCDC. At around the same time Janet staff reported that they had been able to bring their network equipment back online, restoring service to other locations in the region, as their equipment had been less affected by the flooding than initially perceived. Unfortunately the University’s main router could not be returned to service.
Throughout the afternoon UIS staff continued to bring services back online, migrating virtual servers to other locations and by 16:30 we believed that almost all user facing services had been returned to service.
It became apparent that around one-tenth of the wireless access points were not operating correctly but this issue was resolved at around 18:45.
On Saturday 18 July UIS staff attended the CNH with engineers to ensure that battery backup systems for the equipment housed there was fully operational.
On Sunday 19 July UIS staff attended the WCDC to replace an apparently faulty patch cable that seemed to be causing problems for users of the NAT system
On the Monday 20 July it became apparent that the sole NAT box operational at the WCDC was not of sufficiently high capacity to support the volume of traffic being driven through it. As a result UIS staff are investigating options to bring other hardware into place. The operational NAT box is one of a pair previously deployed at the CNH; the other box was destroyed by water damage on Friday 17.
Temporary building work at the New Museums Site has been undertaken by the contractors to add further levels of water defence and no further water penetrated the CNH over the weekend (though it is noted that the rainfall then was significantly less extreme).
UIS staff have continued work on Monday 20 July to move both virtual and physical servers from the CNH to other locations in order to return services to previous levels of resilience.
Last updated: 16:00, 20 July 2015