NIC Failure test
We performed the hard NIC failure test by removing one NIC cable from the
active node that was involved in active recording. After the NIC failure, writing to
the same node failed. When the network fails, the server must recognize the
failure, then it must establish a new connection. Also, when the network fails TCP
socket connections are left open and remain open on the cluster until Isilon's
OneFS forces them closed, which allows the server to continue writing.
We can force the open TCP sockets to close for a duration of less than 2 minutes
by reducing the
TCP keep idle
and
TCP keep interval
timeout to the
optimum values recommended by Isilon Engineering.
To reduce the video loss duration due to the
TCP Socket Open
condition, set
the persistent values in the
sysctl.config
file as follows to reduce the impact
duration time significantly:
isi_sysctl_cluster
net.inet.tcp.keepidle=61000
isi_sysctl_cluster
net.inet.tcp.keepintvl=5000
Refer to the KB article: 000089232 for further information about how to
configure these parameters.
Note
NIC failure impact can be overcome by using NIC aggregation in Active/Passive
Failure aggregation mode, which is explained in the next test case. Connectivity
to the nodes that are not affected by the network outage continues to be
available throughout the test scenario and no impact was observed.
NIC Failure test with NIC aggregation in Active/Passive
We did a hard NIC failure test with Active/Passive aggregation by removing the
active NIC port cable. After the network failure, writing to the same node
continued and the NIC that was passive was immediately changed to the active
NIC. The NIC failure caused no apparent loss.
Note
NIC aggregation in Active/Passive mode remedies only a network
disconnection/NIC failure that happens on the Isilon node or the corresponding
switch port where it is connected.
Node Poweroff Test
To simulate an unexpected single node hard failure, we held down the power
button until the node powered off. This causes the servers that were writing to
that node to reconnect to a new node. In our tests, the servers on the failed node
reconnected to a new node, but did not start writing again for an aggregate
(reconnect and start writing) duration of up to 52 seconds while waiting for
writing to the SMB share to be re-started.
The second issue is that the removal or addition of a node causes an interrupt to
the cluster. Therefore, video servers writing to the other nodes might experience
a short interruption. The duration of the interruption can be reduced by modifying
the OneFS environment variables.
Testing and validation
Tests conducted
23
Summary of Contents for EMC Series
Page 1: ...Surveillance Dell EMC Storage with ISS SecurOS Sizing Guide H14192 REV 1 1 ...
Page 4: ...CONTENTS 4 Dell EMC Storage with ISS SecurOS Sizing Guide ...
Page 8: ...Introduction 8 Dell EMC Storage with ISS SecurOS Sizing Guide ...
Page 12: ...Configured components 12 Dell EMC Storage with ISS SecurOS Sizing Guide ...
Page 16: ...Solution components 16 Dell EMC Storage with ISS SecurOS Sizing Guide ...
Page 20: ...Sizing the solution 20 Dell EMC Storage with ISS SecurOS Sizing Guide ...