Coffee & Beer

Rantings and Ravings of the technical sort

Vim Trick of the Day

So, we have lots and lots of puppet manifests and some of them are cleaner to read than others. I fnd myself re-indenting things all the time, and though, hmm, must be a better way. So, after a bit of searching, I added this to my ~/.vimrc

1
autocmd BufWrite,FileReadPre   *.pp    :normal gg=G

Now, whenever I open, or save a puppet manifest, the whole buffer is re-indented. So, if I’m editing some previously written file, or myself am sloppy, it all gets cleaned up as son as I open/save the file.

Win

Rsync vs Tar

So, I’ll spare the gory details (you can read about some of them at the bosses blog, here and here), but we at work had the need to shift a pretty sum of data (80TB or so) as fast as possible to recover from a hardware failure that impacted one particular group. Oh, and the 80TB is millions upon millions of tiny files.

Now, we were using rsnapshot, so there was no “restore” to be done to the files, we could access them straight away, and even exported them read-only while we came up with a plan so users at least had access.

Once the plan was formed, and destination storage quickly provisioned (hello smaller ext4 LVM volumes instead of monolithic xfs), came the task of copying the data over, and fast.

Rsync is always a friend and a bit of a good standby for moving data around, especially for the situations when you want to preserve everything about the files and have the ability top stop and restart transfers. Obviously we were not going to do this over ssh (ouch), so the target storage areas were mounted on the source systems, and rsyncs started running locally, a la:

1
rsync -av --progress /mnt/backups/section01/ /mnt/destination01/section01

Simple enough.

We had 3, 16 Core, 48GB source systems, and 5 8 core 24GB destination systems, so we spun up a few rsyncs per source system and let them roll overnight.

This morning, not nearly as much data had moved as we had hoped.

Why? Because rsync needs to do fstats on each file, on both ends as it rolls along. Big files flew over the wire, but, most of what we were moving were tiny, tiny files, and as I said, millions and millions of them.

So, we did:

1
2
cd /mnt/backups/
tar -cf - section01 | tar -xf - -C /mnt/destination01/ 

Background that, and crank up a few others on the other sections, and things started going MUCH faster. We wanted a little faster, so:

Source:

1
mount -o remount,noatime /mnt/backups

Destination:

1
mount -o remount,noatime /mnt/destination01

Boom. Far, far fewer fstats. Access times not being dealt with (we don’t care at this point), so fewer fstats again.

Moving Puppet From Subversion to GIT in 15 Minutes While Adding Dynamic Environments

For (almost) as long as we’ve had a pupet installation at work, we’ve had it in Subversion to track changes. This has changed/evolved a few times, but has always remained in subversion in one way or another. Recently we starting tracking other bits (documentation, scripts, etc) in git, and the idea of being on 2 different revision systems didn’t really sit well with me. Most of the team has taken up git very well, so the choice was made to move our puppet manifests etc to git. Once i started loking into it, also found the very cool “Dynamic Environments w/ git branches” trick, talked about here and here, and thought the move to git would be the perfect time to move to this.

I’m going to skip the testing/waiting I had to do (making sure our git server and puppet amster could talk ssh to each other, setting up the ssh keys for git to use to push and puppet to pull, etc), and jump right into the implementation and transition, which took about 15 minutes total yesterday.

So, first, To get the lions share of the data shifting done, I used svn2git to get the svnrepo sync’d to a git repo. our svn repo, while it used to have a few branches, prior to this had been compacted to just a trunk (which wasn’t called “trunk”) and no tags etc. So, I ran:

1
$> svn2git --username matt --nobranches --notags --rootistrunk -v https://puppet.server/svn/puppet/

And let that chug along for a while.

Once that was done I could add the remote git server as an origin:

1
2
$> git remote add origin git@git.server:puppet.git
$> git push

Now, the puppet svn repo is in git and on the git server. I had to sync a few times due to some quick changes going into the svn repo, so that was a simple

1
2
$> svn2git --rebase
$> git push

Okay, finally it was go time. Step 1, make the svn repo readonly. How? Thisis served with apache, so, a simple pre-commit hook did the job:

1
2
3
#!/bin/sh
echo "This svn repo is now read-only! No Commits accepted!"
exit 1

See what I did there? The pre-commit never DOES anything with the commit, so the commit is never acepted by the svnserver!

Okay, on to the transition. On our puppet master, everything lived in /etc/puppet which was an svn repo. Lets stop the puppet master for a moment (clients will go about their merry way) move that aside, and setup the new location:

1
2
3
4
5
6
$>service  httpd stop (we run puppet via passenger)
$>cd /etc
$>mv puppet puppet.svn.backup
$>mkdir -p puppet/environments
$>chown -R puppet:apache puppet
$>chmod -R g+w puppet

Okay. Now, lets get into the weeds for a second here. So, using the dynamic environments, our git server will, with a post-receive hook I’ll show in a second, ssh to our puppet master, and checkout the branch (master or otherwise) to /etc/puppet/environments/$BRANCH. Now, we tend to use “master” for most of our small edits (adding a node, etc), so “master” => “production”. I didn’t want a puppet environment called “master”, so a tiny bit extra logic was added to the post-receive to change the location of the branch checkout to “production” if the master branch was change, but otherwise use the branchname, such as “matts_new_feature”. Okay here is the post-receive:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/bin/sh
read oldrev newrev refname

REPO="git@git.server/puppet.git"
BRANCH=`echo $refname | sed -n 's/^refs\/heads\///p'`
BRANCH_DIR="/etc/puppet/environments"
SSH_ARGS="-i /var/lib/puppet/.ssh/id_puppet_rsa"
SSH_DEST="puppet@puppet.server"

#If working on the master branch, its really production and should go in environments/production
if [ "$BRANCH" == "master" ] ; then
        BRANCH_REAL="production"
else
        #Otherwise its a non-master/production branch and the env can be created w/ the branch name
        BRANCH_REAL="$BRANCH"
fi

if [ "$newrev" -eq 0 ] 2> /dev/null ; then
  # branch is being deleted
  echo "Deleting remote branch $BRANCH_DIR/$BRANCH_REAL"
  ssh $SSH_ARGS $SSH_DEST /bin/sh <<-EOF
  cd $BRANCH_DIR && rm -rf $BRANCH_REAL
EOF
else
  # branch is being updated
  echo "Updating remote branch $BRANCH_DIR/$BRANCH_REAL"
  ssh  $SSH_ARGS $SSH_DEST /bin/sh <<-EOF
  { cd $BRANCH_DIR/$BRANCH_REAL && git pull origin $BRANCH ; } \
  || { mkdir -p $BRANCH_DIR && cd $BRANCH_DIR \
  && git clone $REPO $BRANCH_REAL && cd $BRANCH_REAL \
  && git checkout -b $BRANCH origin/$BRANCH ; 
  EOF
fi 

This, paired with a nice pre-receive server side syntax check, and we’re looking pretty good and automatic (I’ll share that in another post) Okay, so thats in place. Now, svn2git works great, btu i want to use a clean, clean git-only checkout of the new puppet repo to finish this off with, so, on my local system:

1
2
3
4
$> mv puppet puppet.svn-git
$> git clone git@git.server:puppet.git puppet.git
$> cd puppet.git
$> vi puppet.conf

Now, in puppet.conf, I setup the dynamic environments:

1
2
3
environment = production
manifest = $confdir/environments/$environment/manifests/site.pp
modulepath = $confdir/environments/$environment/modules

Now lets commit/push:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$> gits
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified:   puppet.conf
#
no changes added to commit (use "git add" and/or "git commit -a")

$> git commit -a
[master d846620] change to puppet.conf for the environments change
  1 files changed, 7 insertions(+), 7 deletions(-)

$> git push
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 383 bytes, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: Updating remote branch /etc/puppet/environments/production
remote: Cloning into production...
To git@git.server:puppet.git
      73f5405..d846620  master -> master

Ta Da! Now, lets look on the puppet master:

1
2
3
4
$>pwd
/etc/puppet/
$>ls environments
production

So, its there, lets start puppet back up:

1
$>service httpd start

Get on a client of two, run puppet agent -t, all is well, and its great success! This litterally (of course, with testing, setting up keys, etc out o the way) took about 15 minutes yesterday. Nothing like pulling the rug out from under 1800+ systems/14000 cores and putting it back without them noticing!

Now, the really cool part is:

1
2
3
4
$>git checkout -b matts_test
$>vim somefiles.pp
$>git commit -a
$>git push

Will make a new branch/environment in /etc/puppet/environments/matts_test, which clients can use like puppet agent -t --environment=matts_test! Want it to go away after changes have merged? Simple!

1
2
3
4
5
  $> git checkout master
  $> git merge matts_test
  $> git push origin :matts_test
  $> git branch -d matts_test
  $> git push

Ta da! The branch/environment is no longer valid, and as been removed form /etc/puppet/environments/!

Mirroring Ubuntu on CentOS

Been building a new mirror (internal) host @ work, and its been an adventure. This isn’t a public mirror, and we’re not interested in mirroring all releases of Distro X, but rather only a handful or one. We basically want to mirror:

  • CentOS 5 & 6
  • EPEL 5 & 6
  • PuppetLabs Repos for el5 & 6
  • Ubuntu LTS Release(s)

Now, the mirror box is a Centos6 host, so this is pretty simple when it comes to the CentOS mirrors, thats comeing in another post. We do some funky things like want a fully sync’d mirror, but also want point-in-time snapshots so we have a static source to build compute nodes from (while storeage/webserver get the up-to-date repos).

In anycaase, building a Ubuntu mirror, other than just rsync'ing archive.ubuntu.com/ubuntu, on a non-Ubuntu/Debian host is non-trival it seems, just like using something other than rsync to mirror yum repos on something w/o yum installed would be awkward. However, I’ve found a solution:

debmirror

DebMirros is really just a Perl script, and I’m wrapped it up in script for our setup. Took a little bit to get going, though:

yum install perl-libwww perl-Compress-Zlib perl-Digest-SHA1 perl-Net* rsync perl-LockFile-Simple perl-Digest-MD5-M4p
wget http://archive.ubuntu.com/ubuntu/pool/universe/d/debmirror/debmirror_2.10ubuntu1.tar.gz
tar -xzvf debmirror_2.10ubuntu1.tar.gz
...
cd debmirror-2.10ubuntu1
make
cp debmirror /usr/local/bin/
cp debmirror.1 /usr/share/man/man1/
cpan install Net::INET6Glue (couldn't find this in yum). 

Okay, debmirror now works, but needs that wrapper script:

(ubuntu_mirror.sh) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
arch=amd64
section=main,restricted,universe,multiverse
release=lucid
server=us.archive.ubuntu.com
inPath=/ubuntu
proto=http
proxy=http://proxy.local:8888
outpath=/var/www/repos/ubuntu

debmirror       -a $arch \
                --no-source \
                -s $section \
                -h $server \
                -d $release \
                -r $inPath \
                --progress \
                --ignore-release-gpg \
                --no-check-gpg \
                --proxy=$proxy \
                -e $proto \
                $outPath

Make that executable, run it, and we’re off! First sync is running now. Next up is to see how quickly a resync happens, and schedule it to run daily. Hello internal Ubuntu Mirror!

The really nice thing is when the next LTS comes out (Precise Pangolin), I can simple add it to the list of release like release=lucid,precise And we’ll start mirroring that!

Next up, convuluted ways of builing many many Centos Mirrors on one box for various reasons

Getting Kvm Domain Info Into Puppet Facts

While I’m yet to find the holy grail of getting a KVM host to be aware of which ‘dom0’ its being hosted on, this bit of ruby for a custom puppet facts is a step there, the other way around:

(vms-running.rb) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
require 'facter'
begin
        Facter.compute_node
rescue
        Facter.loadfacts()
end


unless  `rpm -qa | grep ruby-libvirt`.empty?
        if (Facter.value("puppet_class_kvm_host") == "true") then
                Facter.add("kvm_vms") do
                        setcode do
                                require 'libvirt'
                                conn = Libvirt::open('qemu:///system')
                                if (conn.num_of_domains == 0 ) then
                                        kvm_vms = "NO VMS"
                                        conn.close
                                        kvm_vms
                                else
                                        vm_doms = Array.new
                                        conn.list_domains.each do |domid|
                                                dom = conn.lookup_domain_by_id(domid)
                                                vm_doms.push(dom.name)
                                        end
                                        vm_doms=vm_doms.join(" ")
                                        conn.close
                                        vm_doms
                                end
                        end
                end
        end
end

Which will return a nice fact like:

kvm_vms => Host1 Host2 Host 3

or if none are running:

kvm_vms => NO VMS

its ugly, but it works!

Finding Compute Resources With Puppet Facts

One question we get asked at wrk a good bit centers around finding compute resources that match a specific set of specifications.Since the compute system we haev are of all sort of various ages and hardware types, jobs that need 8GB of memory might not work everywhere, but most places.Same goes for a specific number of cores (or, minimal #), per node. Some have 4, some have 12+.

Well, facter, and therefore puppet, knows about these things. And there is a REST api we can query. And I needed an excuse to do a little ruby:

(find_compute.rb) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#!/usr/bin/env ruby

require 'yaml'
require 'puppet'

puts "Welcome to the Compute Finder script!"
puts "This script aims to help you locate compute nodes based on simple requirements"
puts "Such as the minimum amount of RAM or # of processor cores"
puts "---"
puts "---"
puts "What minimum amount of ram would you like?"
puts "(in GB, leave blank for no minimum)"
mem=gets.chomp
puts "What minimum number of processor cores would you like?"
puts "(leave blank for no minimum)"
procs=gets.chomp

sleep 1

search ="facts.compute_node=true"
if mem != ""
  search << "&facts.memorysize.ge=#{mem}"
end

if procs !=""
  search << "&facts.processorcount.ge=#{procs}"
end

if (procs == "" and mem == "" )
  puts "You must specify some requirments!"
  exit
end
puppet_base = "https://puppet:8140"
path="/production/facts_search/search?"


puts "Finding matching nodes, please wait..."
cmd = "curl -s -k -H \"Accept: yaml\" \"#{puppet_base}#{path}#{search}\""
ans = %x[#{cmd}]
nodes = YAML::load(ans)
nodearr = []
nodes.sort.each do |node|
  nodearr.push(node)
end
puts nodearr
puts "Would you like more info on the nodes found? This may take bit longer..."
puts "(y/n)"
more=gets.chomp
if more == "y"

  node_array = []
  nodes.each do |node|
      node_url="https://puppet:8140/production/facts/#{node}"
      node_cmd="curl -s -k -H \"Accept: yaml\" #{node_url}"
      node_ans= %x[#{node_cmd}]
      node_array << YAML::load(node_ans)
  end

  results =[]
  node_array.each do |node|
      name = node.name
      facts = node.values
      ram=facts['memorysize']
      procs=facts['processorcount']
      results.push("Host: #{name} RAM: #{ram} ProcessorCores: #{procs}")
  end
  results.sort!
  puts results
end

The output of which looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Welcome to the Compute Finder script!
This script aims to help you locate compute nodes based on simple requirements
Such as the minimum amount of RAM or # of processor cores
---
---
What minimum amount of ram would you like?
(in GB, leave blank for no minimum)
64
What minimum number of processor cores would you like?
(leave blank for no minimum)
12
Finding matching nodes, please wait...
node01.local
node02.local
node03.local
node04.local
Would you like more info on the nodes found? This may take bit longer...
(y/n)
y
Host: node01.local RAM: 94.28 GB ProcessorCores: 12
Host: node02.local RAM: 94.28 GB ProcessorCores: 12
Host: node03.local RAM: 86.39 GB ProcessorCores: 12
Host: node04.local RAM: 251.89 GB ProcessorCores: 32

Puppet Facts About Puppet Classes

So, obviously,we use puppet a great deal. One thing a few of us have wanted for a while was the ability to ssee that nodes/node groups would be affected by a change to a particular class. In other words, what hosts get $This class?

Now, nodes are aware what classes they get, but the puppet master seemingly is not, not really. It compiles teh catalog on a per-run basis, and while durring that it DOES know what classes a system gets, it doesn’t really store that info. So, a simple custom fact to pull that data in from each host:

(puppet_classes.rb) download
1
2
3
4
5
6
7
8
9
10
#list puppet classes, on per fact
require 'facter'

IO.popen('cat /var/lib/puppet/classes.txt | tail -n+3').readlines.each do |line|
        Facter.add("puppet_class_#{line}") do
                setcode do
                        "true"
                end
        end
end

So, now we get, for each host, a bunch of facts like:

1
puppet_class_syslog => true 

Reported from facter -p, and therefor sent back to teh puppet master, into a DB, which we can query directly, via teh puppet api, or via Foreman.

Ruby for Scripting

So, this isn’t “new”, I wrote it a few weeks back, but, well, meh, figured I’d share. First “useful” ruby-instead-of-bash script for a little automation/keeping things consistent.

@ work we have the need to move a whole mess of storage (just under 100TB) from one thing to another, and to preserve quotas as we do so. Most of the quotas are simple 1TB tree/project quotas. Some are bigger, but by default we want to start @ 1TB and then we can bump the ones needed. Anyways, I need to do this for about 8 groups of 5-20 “projects/trees”, so by hand isn’t really an option. Hence the script:

(mkscratch.rb) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#Usage: mkscartch <project/group name> 
#Makes the needed project/projid entires turns on the project quota, and sets a 1TB default hard/soft limit. 
if ARGV.length != 2
  puts "usage: mkscartch <project/group name> <full path to directory>"
  puts "This script takes a lab/group name, and a path as arguments"
  puts "It will make a directory for the lab/group at the path, and turn it into a default 1TB project quota"
  exit
end

new_scratch = ARGV[0];
new_path = ARGV[1];
project_nums=Array.new
#first, we need to figure out what project ID to start with
File.open("/etc/projects").reverse_each { |projects|
        project_nums.push(projects.split(':')[0])
}
last_project = project_nums[0].to_i;
new_project = last_project+1
project_line="#{new_project}:#{new_path}"
projid_line="#{new_scratch}:#{new_project}"
mkdir_cmd="mkdir #{new_path}"
quota_set="xfs_quota -x -c 'project -s #{new_scratch}'"
quota_limit="xfs_quota -x -c 'limit -p bhard=1024g bsoft=1024g #{new_scratch}'"

sleep(1)
puts "Adding #{project_line} to /etc/projects..."
File.open("/etc/projects", "a") do |f|
        f.puts(project_line)
end
sleep(1)
puts "Adding #{projid_line} to /etc/projid..."
File.open("/etc/projid", "a") do |f2|
        f2.puts(projid_line)
end

sleep(1)
puts "Making #{new_path} with #{mkdir_cmd}"
Dir::mkdir(new_path)

sleep(1)
puts "Setting quota with #{quota_set}"
system "#{quota_set}"

sleep(1)
puts "Setting quota limits @ 1TB with #{quota_limit}"
system "#{quota_limit}"

So, I can, from the place I want to sync from, do something like:

1
$ for i in *; do /usr/bin/mkscratch.rb $i /new/location/$i; done

And be done with it. Really basic, I know. But, first thing I would have normally shell scripted done in ruby instead. And you know what? I liked it, Its nice. So there.

Follow Up to Lazy: Reboot

Weekend time and time to do a bit of house clenaing. For a bit the macbook pro has beend draining a bit too fast and getting a bit too hot. I’m not convinced its hardware. I did an upgrade from Snow Leopard to Lion, and there is a good bit of cruft around. So, I decide to do a nice clean wipe, fresh Lion install, and keep it basic.

How basic?

I normally tend to say/think that all I really need to do 98% of my day to day work is Chrome and a Terminal. One of those is built in. So, a clean reinstall, install Chrome (which I’ve rebased to have very few extensions etc), install RVM, get my gems, pull dotfiles/etc from github, install homebrew for building UNIX app (Thanks to Phil for the recommendation, I already like it far better than macports), get XCode from the app store, and well, thats about it. Really.

I’m going for the minimal number of apps installed. No Office if I can get away with it (Google Docs), no Mail.app/Thunderbird (Gmail/Google Apps), No GUI text editors (VIM!). I’m goign to try to avoid Dropbox, though that won’t last long, but I’m going to give it a thorough cleaning and move lots to github/a private git server.

Thats the next task up, a git server/repo @ home. Might got KVM on my home server, and stand up a handful of minimal VMS, for testing, playing around, sperating services a bit.

More to come!

Progress

Made a little progress:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
nichols2::LSCI-NICHOLS2-2 { 17:00:53 Fri Sep 30 }
~/Development/rudo-> rake init
RuDo init started, creatding SQLite DB...
nichols2::LSCI-NICHOLS2-2 { 17:00:56 Fri Sep 30 }
~/Development/rudo-> rake new_list["Todo"]
New List "Todo" created with ID#[1]
nichols2::LSCI-NICHOLS2-2 { 17:01:05 Fri Sep 30 }
~/Development/rudo-> rake new_list["Work"]
New List "Work" created with ID#[2]
nichols2::LSCI-NICHOLS2-2 { 17:01:09 Fri Sep 30 }
~/Development/rudo-> rake new_list["Home"]
New List "Home" created with ID#[3]
nichols2::LSCI-NICHOLS2-2 { 17:01:17 Fri Sep 30 }
~/Development/rudo-> rake lists
RuDo has the following lists:
1
Todo
2
Work
3
Home

It works. Sending the push to github for the day.