Ganglia

Monitoring clusters and Grids since the year 2000

Ganglia 3.0.2 Released

The Ganglia Development Team is pleased to announce the release of Ganglia 3.0.2 (Wilbur) which is available for immediate download from http://ganglia.info/downloads.php.

This release is mainly fixing bugs. For a detailed description of the changes see the Changelog included in the tarball.

Some of the highlight are:

  • New AIX metrics code
  • NetBSD support
  • “–pid-file” option for gmond and gmetad
  • Old gmond “location” staments are now handled correctly
  • “gmond –location” now works correctly
  • Compile fixes for MacOS Tiger
  • Gmond no longer core-dumps on 64-bit Linux platforms
  • cpu_wio is now reported correctly
  • PHP fixes in the web-frontend
  • many more…

The following Bugzilla entries are adresses: 27, 49, 54,62, 63, 68,70, 72.

This release has been tested on the following platforms:

  • Fedora FC4 / ia32
  • SuSe 9.0 / x86_64
  • RHEL3 / ia64
  • Mac OS Tiger
  • Solaris 2.8 / Sparc-64
  • AIX 5.2, 5.3

Enjoy.

The Ganglia team

Ganglia 3.0.1 Released

The Ganglia Development Team is pleased to announce the release of Ganglia 3.0.1 (Wright) which is available for immediate download from http://ganglia.info/downloads.php and features…

gmond Unicast Communication Bug Fixed
This serious bug caused unicast-only gmond to completely stop sending metric updates after network failures.

gmond.conf Conversion Bug Fixed
If you converted your old 2.5.x configuration files to 3.0.0 using the gmond conversion feature. e.g.
  % gmond --convert my_old_gmond.conf > my_new_gmond.conf

then you will want to change the host mask from 24 to 32 for all you trusted hosts. e.g.

  tcp_accept_channel {
    port = 8649
    /* your trusted_hosts assuming ipv4 mask*/
    acl { 
       default = deny
       access {
        ip = <trusted_host_ip>
        mask = 24 /* <========== BUG! */
      }
    }
  }

The conversion code in 3.0.1 correctly sets the host mask to 32.

gmond.conf now processes include() statements
This simple feature provides more flexibility in configuring gmond. e.g.
  globals {
    include(globals.conf)
  }

Network Metrics Bug Fixed for Linux 2.6.x Kernels
A bug in the pkts_in/out and bytes_in/out collection code cause Linux 2.6.x system to report bogus network metrics.

Cleaned up bug in RPM for package upgrades
When upgrading a previously installed ganglia package, the error in the spec file will result in a file named “1” written into the / directory.

FreeBSD Metric Collection Enhanced
There have been a number of bug fixed and cleanups of the metric collection code for FreeBSD thanks to the work of Brooks Davis.

Host view update
The host view web pages now express the time when gmond was started on the host thanks to the work of Jason A. Smith.

We have deployed a new bugzilla service at http://bugzilla.ganglia.info/. This site was created for you to submit bug reports, feature requests and upload patches for ganglia.

If you have found ganglia to be useful in your organization, please consider making a donation to the project at http://sourceforge.net/donate/index.php?group_id=43021

Thanks for using Ganglia!

The Ganglia Development Team

Ganglia 3.0.0 Released!

The Ganglia Development Team is pleased to announce the release of Ganglia 3.0.0 (Kittyhawk) which is available for immediate download from http://ganglia.info/downloads.php and features…

Windows Support
Ganglia now runs on Windows. There is support for all standard metrics except for disk_free, disk_total, max_part_used and cpu_num (support will be added in future releases).

We have also created a windows installer which allows you to easily add the ganglia monitoring service to any Windows NT/2000/XP machine.

Currently, you are required to use unicast messaging since there is no support for multicast on windows at this time (although multicast support will be added in the future).

Special thanks to Carlo Marcelo Arenas Belon for providing metric code which makes native windows calls to collect the majority of metrics.

Unicast Support
Ganglia now allows you to send status messages over unicast routes instead of a single multicast channels. This capability gives you greater flexibility in building your monitoring overlay and allows ganglia to run on networks that are not multicast-enabled.

Moreover, you can specify as many unicast and multicast channels as you like. Whenever a message is sent each and every channel will receive the message. This feature gives you much more power in grouping machines.

Gmetric commandline tool parses the configuration file
Gmetric now parses the gmond configuration file and sends metric information to all unicast and multicast udp channels specified.

Apache Portable Runtime library
The Apache Portable Runtime (APR) library is the library underlying the Apache web server which provide memory pools, networking io, hash tables and arrays in a very portable manner. APR now serves as the heart of the new ganglia monitoring daemon to expand portability, improve reliability and provide new features like IPv6 address support.

More powerful and flexible configuration
The configuration file for gmond has changed. This change was necessary to provide you with a more flexible and powerful framework in which to configure gmond. There is a man page for gmond.conf (see man gmond.conf) which explains the new format.

To convert an old 2.5.x configuration file to the new format simply run

  % gmond --convert old.conf > new.conf

This new format allows you to specify multiple unicast and multicast channels to send and receive monitoring information, provides much more flexible access control lists, and allows you the power to specify exactly what metric you want to collect on each machine.

Special thanks to the developers of confuse (http://www.nongnu.org/confuse/) for building such a great file parser.

Configuration analysis gives bandwidth usage
There is a new option for gmond which allows you to get an estimate of the bandwidth that gmond will use given a particular configuration.
  % ./gmond -b /etc/gmond.conf
  7.945789 bytes/sec

This feature allows you to budget how much bandwidth you will use for monitoring your machines for a given configuration (see man gmond.conf).

More powerful Access Control mechanism
In the old 2.5.x world, the only access control mechanism available was a list of trusted_hosts.

Ganglia now supports very elaborate access control lists that allow you to specify an ip and mask (for filtering subnets) and outline the default policy (see man gmond.conf for details).

You have complete control over metric collection
The new configuration file format allows you to specify exactly which metrics are collected. You can also specify custom time and value thresholds per metric at runtime instead of needing to modify source at compile time. This flexibility will allow us to easily add alert mechanism in the near future.

RPM names were renamed on Linux
The RPM names have been renamed to make them simpler
  ganglia-monitor-core-gmond    =>   ganglia-gmond
  ganglia-monitor-core-gmetad   =>   ganglia-gmetad
  ganglia-monitor-core-lib      =>   ganglia-devel
  ganglia-webfrontend           =>   ganglia-web

Major cleanup of ganglia-devel
Lots of unnecessary headers where removed from libganglia and a ganglia-config script was added for application that link against ganglia (see ganglia-config –help for details).

ganglia-devel now installs only the following files

  /usr/bin/ganglia-config
  /usr/include/ganglia.h
  /usr/lib/libganglia.a
  /usr/lib/libganglia.la
  /usr/lib/libganglia.so

Solaris gmond doesn’t have to be run as root anymore
Special thanks to Adeyemi Adesanya for switching the Solaris metric gathering code from kvm to kstat, eliminating the need to run gmond as root. Gmond on Solaris can now setuid to any user that you like (see man gmond.conf for details).

Mixing different OSes on same channel is okay now
There was a bug in 2.5.x that caused Solaris and HPUX hosts to interpret metric data from other operating systems incorrectly. You can now mix any and all supported operating systems on a single communication channel with no problems.

Fixed the XML DTD
In certain circumstances, gmond would export invalid XML because of too restrictive of a DTD. The DTD has been updated to prevent this error.

Darwin metric collection greatly improved
Darwin now supports mem_total, bytes_in, bytes_out, pkts_in, pkts_out, proc_run, disk_total, disk_free and part_max_used metrics. Special thanks to Sebastian Hagedorn, Glen Beane, Joshua Durham, Eric Wages and Brian Peterson for their work on MacOS X.

Fixed bug that required Solaris systems to run in debug mode
Gmond wasn’t properly daemonizing on certain Solaris systems requiring that it be run in debug_mode with the output redirected to /dev/null. This bug no longer exists.

Fixed a memory leak on FreeBSD
Brooks Davis fixed a memory leak reported by Glen Beane in find_disk_space() and a potential memory leak in makenetvfslist(). General clean up of makenetvfslist().

All metric collection functions are in a standalone library
All the metric code has been moved to ./srclib/libmetrics in the ganglia distribution. Special Thanks to Martin Knoblauch for his hard work in cleaning up the metric collection code.

Potential memory leak fixed in gmetad
Marcelo Veiga Neves determined how a memory leak was possible for metrics sent via gmetric. Federico Sacerdoti applied a fix to prevent any leaks.

All web scripts are in the ./web directory of the distribution now
The PHP web scripts have been incorporated into the main ganglia distribution. Minor bug fixed added by Ramon Bastiaans and Jason Smith.

All communication protocols are now defined in ./lib/protocol.x
To help in integrating ganglia communications into other applications, all XDR communication formats are defined in ./lib/protocol.x. This XDR description file can be parsed by rpcgen, for example, to build XDR code for sending and receiving status messages.

Added a –foreground flag to gmond
Allows you to force gmond to run in the foreground.

Gmetad on Solaris bug fixed
David Wood fixed a bug creating directories on Solaris.

We have deployed a new bugzilla service at http://bugzilla.ganglia.info/. This site was created for you to submit bug reports, feature requests and upload patches for ganglia.

If you have found ganglia to be useful in your organization, please consider making a donation to the project at http://sourceforge.net/donate/index.php?group_id=43021

Thanks for using Ganglia!

The Ganglia Development Team

bugzilla.ganglia.info

a new bugzilla site has been created to make submitting bug reports and patches to the ganglia project much easier.

http://bugzilla.ganglia.info/

this site was created in anticipation of the upcoming 3.0.0 release. we hope this service will allow us to organize and move more quickly to add features and stomp out bugs.


Ganglia Is Part of OSCAR 4.0

The Open Cluster Group is please to announce the release of OSCAR version 4.0.

Feature list of 4.0:

  • Red Hat Linux 9, Red Hat Linux Enterprise Linux (RHEL) 3, and Fedora Core 2 support
  • New RPM dependency finder helps build the server (DepMan/PackMan)
  • SIS 3.3.2
  • Ganglia is now included in the distribution
  • Torque is now included as the default scheduler (OpenPBS can still be downloaded from OPD)
  • Multiple bug fixes and Wizard improvements
This release supports both x86 and Itanium systems. Itanium support is provided by RHEL 3.

This release is available for download from the OSCAR project website: http://oscar.openclustergroup.org

#ganglia Channel on Freenode

You can chat with other ganglia users and developers at IRC channel #ganglia on freenode (special thanks to Majestik and bli_gsc for serving as channel ops).

RocketCalc SSH/authd Solution

RocketCalc has developed and maintains a version of OpenSSH that uses Ganglia’s authd authentication. The following is a is a quote from their web site where the software resides….

Authd is a simple SSL-based authentication mechanism that makes it simple to authenticate users on clusters. Authd was written by Brent N. Chun and is part of the Ganglia package. Authd is used by gexec to launch processes on cluster nodes. We really like gexec and authd. Although gexec is supremely capable for launching jobs, we also thought it would be cool to add authd as an authentication method to OpenSSH. Our patches and source code below do just that. On systems with Ganglia/authd/gexec installed, one can add the modified ssh and use authd for reasonably secure, password-less user authentication on the cluster without any set-up required by the user.

Ganglia-python 3.3.0 Released

Dear Ganglia Developers and Users, This second announcement is for an updated ganglia-python package in Ganglia. This represents code from the latest Rocks release which has shown itself to be useful to the greater Ganglia audience. Ganglia Python 3.3.0
  • From Rocks 3.3.0 release. Various improvements, including new command line options to see only dead nodes.
  • New addition of Ganglia-news RSS cooker. This tool generates an RSS stream for a ganglia cluster, notifying about dead nodes, etc.
  • Please see the release notes at for more information on using the ganglia-news RSS system.
Enjoy! Federico Rocks Cluster Group, San Diego Supercomputer Center, CA

2.5.7 Released!

Dear Ganglia Developers and Users,

We are pleased to announce the release of Ganglia 2.5.7. This is a minor feature enhancement to the monitor core and webfrontend. If you are a US citizen, this is your tax dollars hard at work :) Ganglia is free for download at ganglia.sf.net.

Ganglia Monitor Core 2.5.7

  • New gmetad cleanup thread prevents metric explosion in certain cases.
  • Gmetad presents a more accurate “TN” in xml.

Ganglia Webfrontend 2.5.7

  • Cleaner static metrics in host view
  • Interface refinements: GB units in phys view
  • New host gmetric view

Enjoy!

The Ganglia Management Team

Linux POSIX Threads

People who use gexec and pcp on the latest Linux kernels will find that it hangs when executed. The problem is that Linux 2.4.x doesn’t implement the full set of POSIX cancelation points (e.g., sem_wait, sigwait, etc. are not implemented). This, it turns out, is the fundamental cause for GEXEC and PCP hanging on these systems. Also, terminal related signals (e.g., SIGTTIN) don’t appear to handled correctly. I’m told that in 2.6.x kernels, some of these problems have been fixed. But in the meantime, set your LD_ASSUME_KERNEL environmental variable before you start gexec daemons or clients.

export LD_ASSUME_KERNEL="2.4.10"

In the future most (if not all) ganglia components will not rely on POSIX threads at all given the chaotic nature of threads on Linux.