Home · Search · About ING · Astronomy · Public Information · Engineering



RESILENT LINKS AND NETWORK REDUNDANCY


This document describes the actual network configuration done within the ING to provide resilient links between the WHT and the INT telescopes as well as redundant network devices in all the WHT observing system. The idea is to make the basic network backbone and the observing system fault tolerant against a link failure and also have an easy way for the DE/IO to recover the network basics in case that one or more network devices fail.

The way this is achieved is by using a feature included in the 3Com switches used in the ING called Spanning Tree Protocol. This feature automatically recovers a network connection in 30 seconds when an alternative way or connection is available. There is also another version of this called Rapid Spanning Tree Protocol that does the job in 5 seconds. Actually the first one is used in the ING simply because there are old network devices that does not support the second one

The first thing done to get this working was to configure the 3Com switches included in the backbone and the WHT observing system to allow this. There are several conditions to activate this feature in a network, the first one is that all the devices (switches) in that network have to be configured to use Spanning Tree, even if they are not part of any resilient link. The second condition is that all the devices should have the same version of the operating system, or at least, all the devices of the same model. Here is a table with the models and versions actually installed.

Model

Software version

Example

Switch 3300

2.71

sw1wht

Switch 4400

3.00

sw12wht

Switch 4900

3.00

gb1wht

The switches are connected in a way that there are several paths to reach one destination from the same source. Once this is done, the final decision on what of these paths will be used is made by the switches themselves. This is the actual physical resilient topology between telescopes and inside the WHT.

This configuration can support a broken link between the telescopes or inside the WHT observing system. Under some conditions could support even two broken links.

Another thing is to deal with a broken device. All the important devices between the telescopes and in the WHT observing system are duplicated providing more than a spare device for a given one, spare ports. If a device fails, the DE/IO has to move some UTP cables from the main device to the secondary one, according to the links in this table.

Location

Device

Spare

Function

Failure symptoms

Solution

WHT computers room

gb1wht

gb2wht

Main backbone switch at the ORM. Provides connection between main switches at the WHT as well as other buildings like the Residencia, INT and JKT

Major problems in the hole site with the Internet connection, internal access to DNS, WEB and other services. Laptops unable to access the network. Accounts unavailable.

Move UTP cables to gb2wht.

INT clip centre

gb1int

gb2int

Main backbone switch at the INT. Provides connection between main switches at the INT as well as connection to the WHT and Internet.

Major problems with Internet connection as well as internal services at the INT. Observing system will stop working. Scratch and accounts services stopped. Images cache system and beowulfs unavailable

Move UTP cables to gb2int

WHT computers room

sw3wht

sw13wht

Main backbone switch for the observing system at the WHT. Access to LN plant.

Unable to contact the VAX'es and LPAS machines. TCS X-terminal not working. Unable to access the Concam and ntpserver1.

Move UTP cables to sw13wht

WHT control room. Blue cabinets by the fire alarm control panel.

sw4wht

sw11wht

Connection to Robodimm, Grace, Grhil, CASS and wireless network in the observing area. Autoguiders

Unable to contact devices at Nasmith stations. Unable to contact Robodimm. Loss of connectivity with the autoguiders

Move UTP cables to sw11wht

WHT control room. Blue cabinet aside the fiber optics one.

sw12wht

sw14wht

Network connection for most of the DAS machines

Loss of connectivity with the DAS machines

Move UTP cables to sw14wht





Any question regarding this document, contact Luis Hernandez


Last Updated: 7th Mar 2003
By: Luis Hernandez