serial port issues on IBM xseries with FC4 and High Availability heartbeat

Rick Stevens rstevens at vitalstream.com
Fri May 26 17:22:15 UTC 2006


On Fri, 2006-05-26 at 09:27 -0400, Randy Grimshaw wrote:
> 
> I am trying to run a linux high availability cluster (failover pair)
> using serial as one of the heartbeats.
> 
> Due to numerous serial over-runs the systems are actually crashing
> periodically.
> 
> This is a very frustrating development for a system intended to provide
> HA. (certainly not ha ha ha).
> 
> I have updated to the latest bios.
> I have checked RTS DTS XON XOFF etc.
> This is happening with the stock and custom kernels.
> This is happening on three pairs of servers.
> The serial ports are detected as:
>        Serial: 8250/16550 driver $Revision: 1.90 $ 32 ports, IRQ
> sharing enabled
>        serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> 
> 
> Any advice would be greatly appreciated.

The most common problem with overruns is running too high a baud rate.
Remember, 16550s only have a 16-byte buffer in them.  At 38,400 baud,
you'll fill that buffer in about 260 microseconds.  9600 baud will fill
the buffer in a tiny bit over 1 millisecond.  Flow control tries to
prevent overflows.

Without flow control and if the machine is busy, the interrupt from the
chip may not be serviced in time and you'll miss data because you've
filled the buffer.  Dropping the baud rate down should help, and make
sure you use hardware (RTS/CTS) flow control.  Remember that software
(XON/XOFF) flow control requires the CPU to watch the buffer and send an
XOFF when it gets full.  You're already overrunning the buffer...
software flow control won't help.

Heartbeat stuff between nodes in a cluster is NOT a place to try to
scrimp and save money!  NICs are relatively cheap after all, they have
much bigger buffers in them and they use DMA to transfer data to the
processor instead of one-byte-at-a-time over the I/O ports.  Frankly,
NICS are far more reliable--especially for something this critical.

----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-         The world is coming to an end ... SAVE YOUR FILES!!!       -
----------------------------------------------------------------------




More information about the users mailing list