Linux-HA Logo

This page is old information that used to be on the HaNFS[1] page but was moved here once it was determined that locks do not survive failover with current kernels. It is saved here in case that problem is one day solved. -- Dave Dykstra

HA-NFS testing

In order to verify the behavior of NFS locking, we have done extensive testing on NFS in an HA environment with Heartbeat[2]. This section describes this testing, and the results. We tested NFS I/O with Bonnie++, and tested NFS with the Connectathon suite, and also with a multiple-client NFS locking test of our own design.

Test environment

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 10
warntime 10
initdead 20
udpport 694
bcast   eth0            # Linux
auto_failback off
node    posic066
node    posic067
apiauth ping gid=haclient uid=gshi,hacluster
apiauth ccm  gid=haclient uid=hacluster
apiauth evms gid=haclient uid=root
apiauth ipfail gid=haclient uid=gshi,hacluster

Bonnie++

If you don't use any locking, NFS works quite well with Linux-HA. Bonnie++ (version 1.03a, you can download it in http://www.coker.com.au/bonnie++/[4]) finished running successfully in around 6 hours with two NFS servers failover back and forth for every 2 minutes.

Connectathon lock test

Steps to run a test:

Multiple clients lock test

The source code for the multiple clients lock test code can be found in the contrib/mlock/ directory in the Linux-HA Mercurial[7] repository.

Steps to run a test:

We also tried a wrapper function to override fnctl. In that wrapper function fcntl will be called twice if it fails the first time. Using this wrapper function does return successfully sometimes, but it can still fail.

Bug in NFS?

When the client is running a lock test, if the server failover happens, there is a chance that unmounting the file system will fail. The lock test we ran is Connectathon. This can be easily reproduced by the following steps with only two machines (one for server and one for client):

  1. the server mounts a disk device to a directory
  2. the server starts nfs and nfslock
  3. the client mounts the exported directory from the server
  4. the client runs the Connectathon lock test
  5. the server shuts down nfs and nfslock
  6. the server tries to unmount the device
    • ====> returns error: the device is busy

We always used same kernel version in both the server and the client. We have tried kernel 2.4.20, 2.4.26, 2.6.5-1.339, with Red Hat 9. All of these kernels fail the same way.

However, this is not a disaster from an HA point of view, since Linux-HA (version 1.2.1 or newer) will automatically reboot the machine if this occurs, in order to continue services automatically. Although it is annoying, service continues virtually uninterrupted, and the integrity of the locks and data is unaffected.

JeffLayton[8] found a fix to this problem from the linux-NFS mailing list, which as of May, 2004 the distros need to incorporate into their NFS shutdown scripts. According to Jeff[9], if one sends a SIGKILL signal to the lockd kernel thread, then it will release all its locks and the filesystem can be unmounted. This was discussed earlier on lkml[10].

Conclusion

If client applications do not use file locking, HA NFS works very well. However, if a client application uses locking, it may get errors that it will not get in a single NFS server. IMHO there are some bugs in NFS that cause problems above. -- GuochunShi[11]

However, most of these are now fixed, if you're running the right NFS kernel. But, the occasional lock failure in intensive locking can still occur. There is not yet any known solution. -- AlanRobertson[12]


References

[1]http://www.linux-ha.org/HaNFS
[2]http://www.linux-ha.org/HeartbeatProgram
[3]http://www.linux-ha.org/ha.cf
[4]http://www.coker.com.au/bonnie++/
[5]http://www.linux-ha.org/ha.cf/AutoFailbackDirective
[6]http://www.linux-ha.org/haresources
[7]http://www.linux-ha.org/Mercurial
[8]http://www.linux-ha.org/JeffLayton
[9]http://lists.community.tummy.com/pipermail/linux-ha/2004-May/011128.html
[10]http://seclists.org/lists/linux-kernel/2002/Sep/1841.html
[11]http://www.linux-ha.org/GuochunShi
[12]http://www.linux-ha.org/AlanRobertson


This information provided courtesy of the Linux-HA project at http://linux-ha.org/