[ale] Linux-HA to convert a slave to a master

Richard Bronosky Richard at Bronosky.com
Fri Mar 5 23:04:52 EST 2010


Powering off the master is not enough. The "master" must check the
"slave" to decide whether or not to start up the "service". This
allows you power the machine back up to correct the issue (even
remotely). The problem with "shoot the other node in the head" is that
you don't bury the hardware, you reuse it.

My favorite approach is the have an "IP baton" that you pass between
the machines on a virtual interface like eth0:0. That way the clients
that rely on the HA service don't have to change anything when a
failover happens. I imagine you could use a managed switch for extra
assurance, but I've never tried that. You can service the machines on
the IP that stays with eth0, but eth0:0, which is configured for the
same IP on both machines, doesn't come up on boot. Instead you have a
process that tests for the availability of that service on the baton
IP. If the service is not found, then the interface is brought up and
bound to the service locally.

At this point I hope you are realizing why I quoted master and slave.
There is no master or slave. They are the same. It just depends on how
each came to the party. We could do a better job of helping you if
you'd tell us more about the service that you want to make HA, and the
way they are networked. From here on, I'm going to assume it is a
database, since that is my area of expertise.

I really dislike the crossover cable idea. I feel like an eth card is
a terrible thing to waste™. I'd rather bond the 2 together, or use one
eth for one thing (like incoming connections the a database) and the
other eth for another thing (like replicating to the slave). Also, you
want to test the service on the same nick that the clients are going
to be using the service on. What if the reason that the master is not
serving the clients is that eth0 died? Well, the slave testing the
master via crossover cable on eth1 will have no clue that there is a
problem. WTF! (worse than failure)


On Fri, Mar 5, 2010 at 9:03 PM, Jim Lynch <ale_nospam at fayettedigital.com> wrote:
> Jim Kinney wrote:
>> Yep. Ping is not a reliable test. Better to actually use the slave to
>> probe the master for the same functionality the master is supposed to
>> be serving. Likewise setup a watchdog process on the master to look at
>> the slave for a confirmation file. If it's present, all is ok. If it's
>> missing, restart the process and flag a counter. Wait an appropriate
>> restart time and retest the slave for confirmation. If it still
>> missing, signal the slave for imminent shutdown and call a halt.
> To make it work without conflicts, have the slave power the offending
> master off.
>
> Jim.
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>



-- 
.!# RichardBronosky #!.



More information about the Ale mailing list