ING Banner
Home > Intranet > TO Notes > Troubleshooting and fixing recurrent faults on the new ING Seeing monitor, R2D2


R2D2 Fault Recovery

Possible failures on Startup

Failure at startup may mean no data all night, if repair requires visiting the Tower and no other member of ING staff is present to supervise your safety.
To avoid this, shutdown and crashes must be correctly handled at the end of each session.
Here is a list of symptoms to distinguish between different faults that can show up at the start of observing and a description of how to address them.

  • Telescope fails to slew: coords do not change on GUI, differential coords remain same on server feed, meassage #3 Cannot Perform Slew seen on server feed
    • This occurs if the Server is not shut down at the end of the previous session. Once the Mount receives the Park command, the monitor cannot be restarted from the GUI. The mount must be initialised by following carefully every step of procedures Shutdown and then Startup in the Manual.

  • If the USB camera cannot connect, you get a crash on the server with messages "Creating camera...Device not found...Segmentation fault".
    • This will probably require reseating USB connector on the PC. Since this requires going to the Tower, it's worth trying a soft reboot of the PC (sudo reboot) first.

  • Caught system exception TRANSIENT --- unable to contact the server. This is a Corba error and requires restart of service, with this command:
    • > service omniorb-nameserver start 
    • the following command will tell you if it's active:
    • > service omniorb-nameserver status  
  • If it fails to contact the atlas database, the program will stop there and the cause will be evident. Try ping atlas or check status of network nameserver. If either fails, report network problems to Luis or Alegria.
  • If the GUI starts up with an additional pop-up with message Unable to establish communication with DIMM Server, this looks like a Corba failure. However two occurences have required rebooting the mount. This requires going to the tower and pressing the rocker switch on the mount.

Pointing problems:

If you suspect pointing problems, check first the server message spool to check if the program is searching and reepatedly says "No stars founds" (literally). Next verify that sky conditions are clear or that other DIMMs are producing data. If both of these are true then it may still be a focus problem, so check the reponse to the [Focus+] button in the GUI in the server message spool. If this shows no response (it may even terminate the server, then report is in the Fault database.
Next check that the time being updated on the GUI is UTC. If it is a few minutes wrong, check the time on dimmserver using the "date" command and force sync to ntpserver2 or 3 if necessary. Put this info in a fault report. If it the GUI time is exactly 1 hour wrong, report it as a Fault.
Stop Loop , wait for parking to complete and Start Loop again is a reasonable response to bad pointing, if none of the above clarify the problem.

Failures while monitor is running:

  • If the GUI freezes, i.e. is unresponsive, preventing any shutdown commands being sent to Server, don't panic! The Dome can be closed and the monitor left running until tracking automatically stops. However, control may be recoverable and a clean shutdown carried out, by starting a second GUI (on the same machine or another, see Start Client, point 3 above). From this second GUI, the Stop Loop and Park commands can be sent, but please note the Server needs to be restarted following a Park. See GUI operations, points 4 and 5 above.
  • If the Server window freezes, check how long the program has been searching for a star without result. If this has accumulated about 1 hour, the cause of this crash is likely to be the memory leak. Requires note only a hard restart of PC but also a reset of the focuser.

    If however the Server terminates, returning the prompt, while the telescope is slewing or tracking, it can be restarted without difficulty. The monitor processes will have stopped during the crash, but the mount will finish the slew and continue tracking and its new status will be read by the Server. The GUI, once restarted should work as before.

    1. Server termination error messages

    2. Message Terminate after throwing an instance of 'std_out_of_range' what( ):vector::_M_range_check:__n where n os a very large number. This is caused by low flux found in 1 image during analysis and can be caused by bad seeing on a fainter star such as Alpheratz or Regulus.
      • Solution: Exit GUI, Restart Server then GUI, press [Start Loop]
    3. Message Calculated box out of ccd dimensions, stopping machine. Caused by failing to find one of the star images during the centering phase during cloud or bad seeing on a fainter star.
      • Solution: Exit GUI, Restart Server then GUI, press [Start Loop]
    4. terminate called after throwing an instance of boost_exception_detail::clone_impl<.... - crash in calculation phase, probably caused by a missing image.
      • Solution: Exit GUI, Restart Server then GUI, press [Start Loop]
    5. The above crashes may also be caused if the Focuser has failed for a long time to correct the focus. The Server will not be crashed by a Timeout on the focuser itself. If you see errors in the Server related to Focuser or otherwise suspect problems with it, you should check the Delta centroid plot in the R2D2 Chart web page. The red points show the mean_y value and should be in the range -15 to -25. If they are not, the data quality may be affected and would require a reset of the Focuser.
    6. mvIMPACT::acquire::EValTooSmall - communication problem with USB camera. May require a reboot, see below.

Initialization of hardware:

Soft Reboot of Dimmserver PC:

> sudo reboot (enter password)
This will not cause the focuser to lose its parameters or position.

Hard Restart of Dimmersver PC:

This is the only way to recover from a PC fatal crash, e.g., caused by the memory leak (see top of page). The Hard Reset appears to result in the focuser resetting its position to 0 (although physically it has not moved) and in losing its software settings, requiring the reset described here.
  1. Browse to URL masspdu.ing.iac.es and enter using apc/apc
  2. Click Device Manager tab
  3. Click Control (left menu)
  4. From Control Action menu, select  "Reboot Immediate"
  5. Tick outlet no. 7, dimmserver  PC power (currently on)
  6. Click Next button
  7. Click Accpt having checked action says "dimmserver PC power selected for Reboot Immediate"
  8. After about 1 minute, the PC will be up again and accepting logins
  9. Log off APC connection (to allow other users to log in)

Shutdown and Restart of Mount control box:

  • Soft shutdown of Mount control box using handset:
    1. Menu button, navigate with arrow keys (top right of main keypad) down to Settings submenu, Enter
    2. Navigate down to bottom using arrow keys - last item , Shutdown Mount, Enter
    3. Press Enter again to confirm
OR

  • Hard reset by holding down rocker switch on unit for 10 seconds to power down
    • Wait 30 seconds before startup
    • Power up by pressing rocker switch on unit for 1 second to


      Power Cycle of CCD camera detector:

      Check on camera to see if green pilot light is lit
      Unplug Camera USB cable from the Dimmerver PC (the metallic looking cable)
      Note the USB connection on the camera is sealed with rubber
      Wait 30 seconds
      Plug Camera USB cable back in to a Dimmserver PC socket
      Check again on camera to see if green pilot light is on

      Checks and Reset of Focuser following crash or hard pc reset

      On dimmserver, you need to run the minicom program to communicate with the serial protocol of the focuser. The commands are very obtuse and do not allow correction using backspace/erase. Take careful note of the number of zeroes 0 in each command! If you make a mistake, it is very unlikely you have done any damage. You will just see after entering # there is no response. In any case, simply press Enter and start again.
      All commands begin with :F (note the colon, or dos puntos must be included before the upper-case F) and end with the hash symbol #
      • >  minicom
        	
      • Type Ctrl-A, then e. This should show local echo on on the bottom line of the terminal, meaning you should now see the commands you type.
      • Enter :F7ASKC0#
        The response appears immediately after the command entered.
        • Response :F700001# means the device is recognised as the correctly configured type. All is correct, no further action needed.
        • Response :F710000# means the Focuser has lost its configuration. It can be reset to the correct one by entering :F010000#
        • No response probably means some error in entering the command. Try again entering all the characters as follows :F7ASKC0#
      • Set the step counter to the mid range point using the command :FB0007000#
      • Check focus position is now correct using :F8ASKS0#
        The response should be :F800070000#
      • Quit minicom by entering Ctrl-A, then q




Top | Back

Contact:  (RoboDIMM Project Scientist)
Last modified: 31 August 2022