TCS alpha crashes

Introduction

    These notes are mainly aimed at computing staff who have to investigate why an Alpha has crashed. The section about general
recovery procedures may be of interest to the Duty Engineer.
 

General Recovery Procedures

     Try to establish if the Alpha is responding. From a sparc
           ping lpasn

      Check on system console (monitor by alpha in WHT, VT220 in INT) whether there is
      any activity (when an Alpha crashes it will write messages to screen and go on to dump
      memory to disk (crash dump). After writing the crash dump, the alpha reboots by itself.

       If there is no response at the console, one can interrupt the system by pressing the halt
       (reset) button or control-P on the console keyboard. Then follow the instructions to take
       a crash dump

      If there is still no response then power off the alpha and power back on. Bear in mind that
      it may also be necessary to power cycle CAMAC.
     
       

Common causes of a Crash

     INVEXECPTN

Investigating a Crash

      Log in as SYSTEM or other privileged account.

1) SHOW SYSTEM shows processes running and how long the system has been up.

2) The operator log shows
           SET DEF DSA0:[SYS0.SYSMGR] 
            TYPE/PAGE OPERATOR.LOG
    Can look at previous logs with file version number
            TYPE/PAGE OPERATOR.LOG;-1

3) The system error log, will note hardware errors and crashes
             SET DEF DSA0:[SYS0.SYSERR]
     Convert format
             ANAL/ERR/ELV CONVERT ERRLOG.SYS       
             ANAL/ERR/SINCE=dd-mm-yyyy ERRLOG.CVT

            SHO ERROR will show device errors

             An explanation of messages can be found in:
             OPENVMS System Messages
   
4)  CLUE  will  analyze a  dump when machine is rebooted. When crash happens, the machine will write memory to
      the dump file and then reboot. Sometimes the (duty) engineer will power cycle/press reset button on the alpha
      before dump can be completed. 

             SET DEF DSA0:[SYS0.SYSCOMMON.SYSERR]
             DIR /SINCE=dd-mmm-yyyy  CLUE*.*;* /DATE 

             TYPE/PAGE CLUE$LPASn_ddmmyy_hhmm.LIS

5)  


TCS Software Manager
Last modified: FJG 12 Mar 2007