Monitoring clusters and Grids since the year 2000

Monitoring Core v2.1.0 Is Released!

  • Completely rewrote the underlying hash library because the original hash functions were over-engineered and had a memory bug on certain platforms. New hash functions are superlight and fast. Built test program and profiled/traced all memory functions using mpatrol. No leaks. Special thanks to Mike Howard for letting me test gmond on his cluster which displayed the memory bug. Also thanks to Alan Hagg and Rod Hernandez for patiently answering my questions about the memory bug on their clusters. You help was appreciated!
  • Updated code to catch when transient nameservice errors occur and retry. Correctly handle hosts the don’t resolve instead of treating as an error
  • Added a patch submitted by Joshua J England for gmond to correctly report the number of CPUs and their speed on alpha architectures
  • Added a patch submitted by Eirikur Hallgrimsson and written by Yaroslav Klyukin for gmetric which allows users to chose which network interface gmetric multicasts metric data
  • Changed the “safe_host” option to “trusted_host” to make it clearer. Also added the “num_nodes” and “num_custom_metrics” options for more efficient in-memory cluster image creation
  • Reduced the number of total threads by one by removing the for(;;)pause() spin and having the main thread do server work
  • created the function my_inet_ntop() function in libganglia to deal with the limitations of inet_ntoa in a multi-threaded environment
  • changed the self-organzing behavior of gmond to recognize when a transient error occured on a remote gmond process
  • added verbose error checking of gethostbyaddr() in listen.c