|
|
||
|
Home · Search · About ING · Astronomy · Public Information · Engineering |
||
|
|
||
|
|
||
RESILENT LINKS AND NETWORK REDUNDANCY
This document describes the actual network configuration done within the ING to provide resilient links between the WHT and the INT telescopes as well as redundant network devices in all the WHT observing system. The idea is to make the basic network backbone and the observing system fault tolerant against a link failure and also have an easy way for the DE/IO to recover the network basics in case that one or more network devices fail.
The way this is achieved is by using a feature included in the 3Com switches used in the ING called Spanning Tree Protocol. This feature automatically recovers a network connection in 30 seconds when an alternative way or connection is available. There is also another version of this called Rapid Spanning Tree Protocol that does the job in 5 seconds. Actually the first one is used in the ING simply because there are old network devices that does not support the second one
The first thing done to get this working was to configure the 3Com switches included in the backbone and the WHT observing system to allow this. There are several conditions to activate this feature in a network, the first one is that all the devices (switches) in that network have to be configured to use Spanning Tree, even if they are not part of any resilient link. The second condition is that all the devices should have the same version of the operating system, or at least, all the devices of the same model. Here is a table with the models and versions actually installed.
|
Model |
Software version |
Example |
|---|---|---|
|
Switch 3300 |
2.71 |
sw1wht |
|
Switch 4400 |
3.00 |
sw12wht |
|
Switch 4900 |
3.00 |
gb1wht |
The switches are connected in a way that there are several paths to reach one destination from the same source. Once this is done, the final decision on what of these paths will be used is made by the switches themselves. This is the actual physical resilient topology between telescopes and inside the WHT.

This configuration can support a broken link between the telescopes or inside the WHT observing system. Under some conditions could support even two broken links.
Another thing is to deal with a broken device. All the important devices between the telescopes and in the WHT observing system are duplicated providing more than a spare device for a given one, spare ports. If a device fails, the DE/IO has to move some UTP cables from the main device to the secondary one, according to the links in this table.
|
Location |
Device |
Spare |
Function |
Failure symptoms |
Solution |
|---|---|---|---|---|---|
|
WHT computers room |
gb1wht |
gb2wht |
Main backbone switch at the ORM. Provides connection between main switches at the WHT as well as other buildings like the Residencia, INT and JKT |
Major problems in the hole site with the Internet connection, internal access to DNS, WEB and other services. Laptops unable to access the network. Accounts unavailable. |
|
|
INT clip centre |
gb1int |
gb2int |
Main backbone switch at the INT. Provides connection between main switches at the INT as well as connection to the WHT and Internet. |
Major problems with Internet connection as well as internal services at the INT. Observing system will stop working. Scratch and accounts services stopped. Images cache system and beowulfs unavailable |
|
|
WHT computers room |
sw3wht |
sw13wht |
Main backbone switch for the observing system at the WHT. Access to LN plant. |
Unable to contact the VAX'es and LPAS machines. TCS X-terminal not working. Unable to access the Concam and ntpserver1. |
|
|
WHT control room. Blue cabinets by the fire alarm control panel. |
sw4wht |
sw11wht |
Connection to Robodimm, Grace, Grhil, CASS and wireless network in the observing area. Autoguiders |
Unable to contact devices at Nasmith stations. Unable to contact Robodimm. Loss of connectivity with the autoguiders |
|
|
WHT control room. Blue cabinet aside the fiber optics one. |
sw12wht |
sw14wht |
Network connection for most of the DAS machines |
Loss of connectivity with the DAS machines |
Any question regarding this document, contact Luis Hernandez
|
|
Last Updated: 7th Mar
2003 By: Luis Hernandez |