Padrino - rspec - adding dynamic code to all controllers during testing · 5 February 2013, 13:34
We add in code that allows us to send mock parameters to controller actions via Capybara tests for all controllers while testing -so we can simulate session state ( for example - user being logged in ). In Rails you do this by re-opening ApplicationController in spec/spec_helper.rb and adding in a before_filter. In Padrino you can do this by adding custom code to app.rb in a before block - the before block is called for every controller action.configure :test do before do params.keys.each do |param| if param =~ /^mock_/ mock_param = param.gsub(/mock_/, '') session[ mock_param ] = params[ param ] logger.debug %{ #{mock_param} set to #{params[ param ]}} end end end end
— Max Schubert
Class variables in Rspec tests - reloads will happen during testing! · 5 February 2013, 13:20
As is the case when a ruby application runs in a multi-process web container (phusion passenger for example), during your rspec test runs, classes can be reloaded – meaning any class-level variables that are set cannot be counted on across tests, even within the same describe block.
We have a few classes that act as API wrappers for external services that we like to be able to have off by default in our test environment and on by default in production – we initially used class variables for these, but found that the state was getting reset across runs.
Fix: move the initial states into config/application.rb and config/environments/* and use those to initialize the cattr_accessor definitions at the top of the classes.
— Max Schubert
Nagios deep dive: retention.dat and modified_attributes · 23 November 2010, 07:26
When Nagios core (the daemon, typically started by a script in /etc/init.d/) starts up, it follows a rather involved process to turn the configuration files and domain-specific language (DSL) contained within them into in-memory objects – the 10000 foot view of this process is:
- Parse and validate nagios.cfg
- Parse all text-based configuration files (or read the specified objects pre-cache file) based on the cfg_file, cfg_fir, precached_object_file directive in nagios.cfg and command line options passed to Nagios, validate syntax of all files as the files are read.
- Parse retention.dat and load desired persistent object attributes into memory for any objects that exist based on the flat configuration file or objects.pre-cache contents read in from previous steps (any objects stored in retention.dat that do not have counterparts in the objects pre-cache file or Nagios DSL-based configuration files are ignored).
modified_attributes tells Nagios which attributes of an object should be loaded into memory as Nagios reads object state from retention.dat; the code that uses this field (all DSL-related code is in the xdata/ directory of the source tree) uses bit-shifting to store and determine which attributes should be read into memory for an object and which should be ignored.
From include/common.h:
#define MODATTR_NONE 0 #define MODATTR_NOTIFICATIONS_ENABLED 1 #define MODATTR_ACTIVE_CHECKS_ENABLED 2 #define MODATTR_PASSIVE_CHECKS_ENABLED 4 #define MODATTR_EVENT_HANDLER_ENABLED 8 #define MODATTR_FLAP_DETECTION_ENABLED 16 #define MODATTR_FAILURE_PREDICTION_ENABLED 32 #define MODATTR_PERFORMANCE_DATA_ENABLED 64 #define MODATTR_OBSESSIVE_HANDLER_ENABLED 128 #define MODATTR_EVENT_HANDLER_COMMAND 256 #define MODATTR_CHECK_COMMAND 512 #define MODATTR_NORMAL_CHECK_INTERVAL 1024 #define MODATTR_RETRY_CHECK_INTERVAL 2048 #define MODATTR_MAX_CHECK_ATTEMPTS 4096 #define MODATTR_FRESHNESS_CHECKS_ENABLED 8192 #define MODATTR_CHECK_TIMEPERIOD 16384 #define MODATTR_CUSTOM_VARIABLE 32768 #define MODATTR_NOTIFICATION_TIMEPERIOD 65536
The default value for modified_attributes is 0 – ignore all attributes from retention.dat that have counterpart constants in common.h
When an object’s state for the fields listed is changed as Nagios runs, Nagios changes the value of the modified_attributes field to include the constant that represents the field; this allows the retention.dat parsing code to know which attributes to read into memory as an object is parsed from retention.dat into memory when Nagios starts.
A common use case showing this process:
- User logs into the Nagios UI and disables active checks for a host along with notifications
When these two actions are processed, Nagios core will then change modified_attributes to indicate that the state of the notifications_enabled and active_checks_enabled fields were changed from their default values by setting modified_attributes to 3, which is the result of code similar to this:
modified_attributes |= MODATTR_NOTIFICATIONS_ENABLED modified_attributes |= MODATTR_ACTIVE_CHECKS_ENABLED
When Nagios is stopped, it serializes all objects from memory to disk – the modified_attributes attribute is one of the attributes written to disk.
Our team has taken the approach of writing out our own retention.dat files based on state for Nagios objects stored in a database as a part of our current distributed nagios implementation – knowing how modified_attributes works fixed a long standing bug in our code that was causing attributes for hosts and services that had been modified in-flight to be ignored when Nagios started – we hope this short article helps you avoid the same bug.
Special thanks to my managers Mike Fischer and Eric Scholz at Comcast (a great place to work as a developer!) for allowing me to share information learned while at work based on our use of open source software with the community – and special thanks to Ryan Richins for his work with me on uncovering the cause of this bug in our custom Nagios configuration distribution code.
— Max Schubert
Nagios Performance Tuning - use the RAM (but be careful!), Luke · 5 January 2010, 22:04
We found that migrating as many queues and files as we reasonably can within our Nagios architecture to RAM disks makes a huge difference with the performance of a large Nagios installation. We currently poll over 15k services on over 2k+ hosts in less than 5 minutes 24×7×365.
We use RHEL5; by default RHEL mounts /dev/shm as a RAM disk with 50% of physical RAM available to the partition.
Our opinion on using RAM disks for temporary storage is controversial; a number of users on the Nagios users and developers lists have told me that disks with big caches should be as fast as RAM as files are cached in RAM, but our experience has shown that nothing beats a RAM disk for a fast queue directory or file. Our experiences also taught us that when moving queues to RAM it is very important to also implement supporting code that ensures important data is persisted across reboots or can easily be re-created across reboots.
Our experience is based on machines with SCSI disks in RAID 0, 5, and 1+0 configurations.
Queues and files we moved to RAM that sped up our Nagios architecture noticeably (by over 40% in total):
Nagios (nagios.cfg)
- log_file
- object_cache_file
- status_file
- temp_file
- temp_path
- check_result_path
- state_retention_file
Moving log_file, object_cache_file, and status file to RAM speed up the CGIs in a larger environment. Moving the temp_file, temp_path, check_result_path, and state_retention_file to RAM lowers the latency for Nagios in a larger environment.
We have also taken the radical steps of moving all configuration files into RAM as well as plugins. We use ePN extensively, every time Nagios goes to run an ePN plugin it checks to see if the plugin has changed. Moving plugins to RAM we noticed a speed up.
IMPORTANT NOTE – Do not move everything to RAM without putting in custom, periodic scripts or other processes that back up important files from RAM to real disk so that if the host crashes they can be quickly recovered or re-created!
SNMPTT (snmptt.ini)
The spool file for checks is a good one to move to RAM and speeds up processing.
PNP (npcd.conf and process_perfdata.conf)
The NPCD queue is another directory we moved to RAM and noticed a nice jump in processing time for NPCD.
Summary
Moving any of the above queues to RAM disks will increase the overall speed of your Nagios architecture; the Nagios-specific configuration changes make a very noticable difference but at the price of some additional supporting code to ensure the robustness of critical data. We developed this list over a period of 3-6 months of time, so take your time if you decide to implement any of the changes mentioned in this article; also make sure you have Nagios trending metrics in place beforehand so you can see what kind of difference the above changes make, if any, to your installation.
Special thanks to my managers Eric Scholz, Mike Fischer, and Jason Livingood for allowing us to share our experiences and knowledge with the general public, and extra special thanks to my teammates Ryan Richins and Shaofeng Yang for their work with me in creating an ever-changing and improving Nagios architecture that is stable and gives us incredible performance.
We are still hiring :), contact me if you are interested in working on a terrific team doing interesting and innovative work.
— Max Schubert
Updated Nagios::Plugin::SNMP and Nenm::Utils on Githhub (on CPAN this week) · 26 August 2009, 19:19
I have released version 1.2 of Nagios::Plugin::SNMP to Github:
http://github.com/perldork/nagios—plugin—snmp/tree/master
This release includes:
- Many bug fixes
- Delta processing for SNMP counters with a framework that allows you to plug in your own delta calcuation routine! This version requires Cache::Memcached (and a memcache instance somewhere on the network the code can reach) to do delta processing. Delta processing itself, however, is optional so you do not need Cache::Memcached installed to use the module without the delta processing features.
- Clustered SNMP-agent aware code – for cases where one agent out of N will have a specific OID or OIDS, you can specify multiple hosts to Nagios::Plugin::SNMP and it will try to retrieve the OID from each host listed in turn; it will only die with an error if all hosts fail to return the requested OID.
Additionally I have released an updated version of the Nenm::Utils module that I initially created for the Syngress Nagios book project I lead. This version includes:
- Multiple bug fixes
- More flexible threshold processing
This module is also available on the book site
My team at work uses both of these modules extensively to query several thousand SNMP-based agents every 5 minutes.
Special thanks to:
My teammates Ryan Richins and Shaofeng Yang for their extensive contributions to both of these modules.
My managers at Comcast, Mike Fischer and Jason Livingood, for allowing us to contribute code we have done at work back to the open source community.
Comcast is hiring! Our team is looking for a talented developer with systems administration experience to join our team. Let me know if you are in the northern Virginia area of the US and are looking for a fun and challenging place to work :).
— Max Schubert
Nagios Performance Tuning: Early Lessons Learned, Lessons Shared. Part 5 - Circular Dependency Checking · 6 August 2009, 12:36
NOTE – we are using Nagios 3.0.3, which does not have the very cool patch for the circular dependency checking algorithm recently introduced into the Nagios 3.1.x release tree.
Our startup times for our Nagios instances jumped dramatically today (more than 6x) due to some of our users adding large numbers of new services to their hosts that are associated with their hosts through the
service -> hostgroup -> host
relationship I have discussed often and that we make use of often. We always want our Nagios instance to start on a 5 minute interval as we push most of the performance data we get back from checks into a long-term trending data warehouse.
We also test every configuration release in an integration and test environment before doing a deployment.
With this in mind, we decided to try turning off circular dependency checking on startup for our production Nagios instances.
On one this reduced startup time from 763! seconds to 16 seconds; on the other startup times were reduced from 158 seconds to 6 seconds.
There you have it, a simple way to dramatically reduce startup times, but again, only do this if you test your configuration beforehand in an environment with circular dependency checking on.
— Max Schubert
Easy to use ruby library for interacting with Confluence - confluence4r · 31 July 2009, 12:43
http://confluence.atlassian.com/display/CONFEXT/Confluence4r
I added a gemspec for the package to the bottom of the page if you want to build it as a gem in-house.
— Max Schubert
Why do I get an 'unitialized value' error message from Getopt/Long.pm when Nagios runs my perl-based plugin under ePN? · 25 July 2009, 10:48
Had this message while debugging an ePN-based script today:
**ePN /data/nagios/etc/customers/tean/project/plugins/check_plugin_name.pl: "Use of uninitialized value in pattern match (m//) at /usr/lib/perl5/5.8.8/Getopt/Long.pm line 848,".
Was very puzzled by this as i had never seen that error before, we run 20-30 or more ePN-based scripts, and obviously I don’t maintain that code so how could I have introduced a bug into it?
Answer: I didn’t. What i did do was define a custom attribute for a service but not put any spaces after the attribute in my service definition. E.g.
define command { command_name check_plugin_name command_line $USER10$/team/project/plugins/check_plugin_name.pl \ --check-interval $_SERVICE_PROJECT_CHECK_INTERVAL$ \ --hostname $HOSTADDRESS$ \ $_SERVICE_PROJECT_ALT_HOSTS$ \ -p '$_HOST_SNMP_PORT$' \ --snmp-version 2c \ --rocommunity $_HOST_SNMP_COMMUNITY$ \ --timeout $_HOST_PLUGIN_TIMEOUT$ \ -c '$_SERVICE_PROJECT_CRIT$' \ $_SERVICE_PROJECT_WARN$ }
Notice that at the end of the command line I reference $_SERVICE_PROJECT_WARN$. This style of custom attribute calling lets the user set a warning threshold definition the service definition if they want to, like so
define service { ... __project_warn -w my_threshold_specification ... }
But if they don’t, no changes are needed to the command definition to let it work as the command does not require a warning threshold.
However I then defined the attribute like so in my service definition:
define service { ... __project_warn<-- end of line, no spaces! ... }
This caused Nagios to substitute a null or some other non-printable character as the value of the attribute in the command line before executing it, which in turn got passed through to Getopt/Long.pm as an undefined option name.
The fix .. just add spaces and an empty string to the attribute in the service definition :)
define service { ... __snmp_port 161 __project_warn '' ... }
Voila, no undefined option.
Could be a candidate for either a Nagios custom attribute value fix or a Getopt/Long.pm fix, I am thinking Getopt::Long should set an undefined option name to the empty string so that developers do not have to guard for this condition.
— Max Schubert
Nagios patch withdrawl: only send recovery escalation notifications for services if a problem escalation notification was sent · 24 July 2009, 13:16
Well, I hate to say it, but me oculpa, I had to withdraw the first attempt at the patch I did in an earlier article (which I have hidden for now to make sure others do not download it) that was supposed to fix escalation recovery notification behavior.
My first attempt at the patch was overly naive; if you downloaded it, please remove it from your installation as it will most likely not work for you. It does work for us, but our configuration is very unique and very different from how most people use Nagios.
I have a new version in place at my job and I will be releasing that version next week or the week after next. Why might you trust this new one after my poor first attempts?
- The bugs in it were found through a team code review, so now 3 sets of eyes have looked at the code and they will look at it again before I release to the public.
- I have tested and will test again the patch with configurations that are like most people use Nagios in addition to our own unique configuration to ensure the patch works for the vast majority of Nagios systems.
My apologies if you downloaded and used the earlier patches; thankfully it will not corrupt data etc, just does not do what I promised it would do.
The current version is working for us and working with typical configurations as well I am just not going to repeat the same mistakes I made last time as I know how frustrating it is to back out code.
— Max Schubert
Are you an expert US citizen? · 19 June 2009, 14:59
Email from a recruiter this year included a request for the following skill:
- US Citizen – 10+ years of experience Expert Required
— Max Schubert
Comment [2]