Run-Time
CPU
Deconfiguration
(CPU
Gard)
L1
instruction
cache
recoverable
errors,
L1
data
cache
correctable
errors,
and
L2
cache
correctable
errors
are
monitored
by
the
processor
runtime
diagnostics
(PRD)
firmware
running
on
the
service
processor.
When
a
predefined
error
threshold
is
met,
an
error
log
with
warning
severity
and
threshold
exceeded
status
is
returned
to
AIX.
At
the
same
time,
PRD
marks
the
CPU
for
deconfiguration
at
the
next
boot.
AIX
will
attempt
to
migrate
all
resources
associated
with
that
processor
to
another
processor
and
then
stop
the
defective
processor.
Service
Processor
System
Monitoring
-
Surveillance
Surveillance
is
a
function
in
which
the
service
processor
monitors
the
system,
and
the
system
monitors
the
service
processor.
This
monitoring
is
accomplished
by
periodic
samplings
called
heartbeats
.
Surveillance
is
available
during
the
following
phases:
v
System
firmware
bringup
(automatic)
v
Operating
system
runtime
(optional)
System
Firmware
Surveillance
System
firmware
surveillance
is
automatically
enabled
during
system
power-on.
It
cannot
be
disabled
by
the
user,
and
the
surveillance
interval
and
surveillance
delay
cannot
be
changed
by
the
user.
If
the
service
processor
detects
no
heartbeats
during
system
IPL
(for
a
set
period
of
time),
it
cycles
the
system
power
to
attempt
a
reboot.
The
maximum
number
of
retries
is
set
from
the
service
processor
menus.
If
the
fail
condition
persists,
the
service
processor
leaves
the
machine
powered
on,
logs
an
error,
and
displays
menus
to
the
user.
If
Call-out
is
enabled,
the
service
processor
calls
to
report
the
failure
and
displays
the
operating-system
surveillance
failure
code
on
the
operator
panel
on
the
HMC.
Operating
System
Surveillance
Note:
This
function
is
not
available
on
a
partitioned
system.
Operating
system
surveillance
provides
the
service
processor
with
a
means
to
detect
hang
conditions,
as
well
as
hardware
or
software
failures,
while
the
operating
system
is
running.
It
also
provides
the
operating
system
with
a
means
to
detect
a
service
processor
failure
caused
by
the
lack
of
a
return
heartbeat.
Operating
system
surveillance
is
not
enabled
by
default,
allowing
you
to
run
operating
systems
that
do
not
support
this
service
processor
option.
You
can
also
use
service
processor
menus
and
AIX
service
aids
to
enable
or
disable
operating
system
surveillance.
For
operating
system
surveillance
to
work
correctly,
you
must
set
these
parameters:
v
Surveillance
enable/disable
v
Surveillance
interval
The
maximum
time
the
service
processor
should
wait
for
a
heartbeat
from
the
operating
system
before
timeout.
v
Surveillance
delay
The
length
of
time
to
wait
from
the
time
the
operating
system
is
started
to
when
the
first
heartbeat
is
expected.
Surveillance
does
not
take
effect
until
the
next
time
the
operating
system
is
started
after
the
parameters
have
been
set.
Chapter
4.
Using
the
Service
Processor
47
Summary of Contents for p 655 series
Page 1: ...pSeries 655 User s Guide SA38 0617 03 ERserver...
Page 2: ......
Page 3: ...pSeries 655 User s Guide SA38 0617 03 ERserver...
Page 10: ...viii Eserver pSeries 655 User s Guide...
Page 14: ...xii Eserver pSeries 655 User s Guide...
Page 16: ...xiv Eserver pSeries 655 User s Guide...
Page 24: ...6 Eserver pSeries 655 User s Guide...
Page 32: ...14 Eserver pSeries 655 User s Guide...
Page 36: ...18 Eserver pSeries 655 User s Guide...
Page 90: ...72 Eserver pSeries 655 User s Guide...
Page 144: ...126 Eserver pSeries 655 User s Guide...
Page 208: ...190 Eserver pSeries 655 User s Guide...
Page 214: ...196 Eserver pSeries 655 User s Guide...
Page 217: ......