Ganglia

Monitoring clusters and Grids since the year 2000

Linux POSIX Threads

People who use gexec and pcp on the latest Linux kernels will find that it hangs when executed. The problem is that Linux 2.4.x doesn’t implement the full set of POSIX cancelation points (e.g., sem_wait, sigwait, etc. are not implemented). This, it turns out, is the fundamental cause for GEXEC and PCP hanging on these systems. Also, terminal related signals (e.g., SIGTTIN) don’t appear to handled correctly. I’m told that in 2.6.x kernels, some of these problems have been fixed. But in the meantime, set your LD_ASSUME_KERNEL environmental variable before you start gexec daemons or clients.

export LD_ASSUME_KERNEL="2.4.10"

In the future most (if not all) ganglia components will not rely on POSIX threads at all given the chaotic nature of threads on Linux.