Skip to content

jQuery in use upon this blog

Blog Update

I've just updated the home-grown javascript I was using upon this blog to be jQuery powered.

This post is a test.

I'll need to check but I believe I'm almost 100% jQuery-powered now.

AJAX Proxies

It is a well-known fact that AJAX requests are only allowed to be made to the server the javascript was loaded from. The so-called same-origin security restriction.

To pull content from other sites users are often encouraged to write a simple proxy:

  • http://example.com/ serves Javascript & HTML.
  • http://example.com/proxy/http://example.com allows arbitrary fetching.

Simples? No. Too many people write simple proxies which use PHP's curl function, or something similar, with little restriction on either the protocol or the destination of the requested resource.

Consider the following requests:

  • http://example.com/proxy.php?url=/etc/passwd
  • http://example.com/proxy.php?url=file:///etc/passwd

If you're using some form of Javascript/AJAX proxy make sure you test for this. (ObRandom: Searching google for inurl:"proxy.php?url=http:" shows this is a real problem. l33t.)

ObQuote: "You're asking me out? That's so cute! What's your name again? " - 10 things I hate about you.

 

Proxies and Robots

I don't like repeating myself, but I'm very tempted to past my mini-review of the Roomba Vacuum Cleaner robot into this blog.

Instead I will practise restraint and summerise:

  • It works. It works well.
  • It is a little noisy, but despite this it is great fun to watch.
  • It takes a long time to clean a few rooms, due to the "random walk" it performs. Despite this it is still fun to watch and actually useful.
  • Have I mentioned I grin like a child when it doesn't crash into things, and hums away past me on the floor?

£250. Worth. Every. Penny.

In more Debian-friendly news I've been fighting HTTP proxies today. I've noticed a lot of visitors to the various websites I host are logged as 127.0.0.1 - which is an irritation. My personal machine looks like this:

Internet -> Apache listening on *:80 -> thttpd on 127.0.0.1:xxxx

(This has been documented previously - primarily it is a security restriction. It means I can run per-UID web-servers.)

I had previous added a patch to thttpd to honour the X-Forwarded-For: header - so that it would receive the correct remote address passed on from Apache. However the fact that so many visitors are logged as coming from 127.0.0.1 meant it wasn't working 100% correctly, and I wanted to understand why.

Today I used ngrep to capture the incoming headers and the source of the problem became apparent:

skx:~# ngrep  -d lo  X-For ' port 1007'
..
T 127.0.0.1:41886 -> 127.0.0.1:1007 [AP]
  GET /about/ HTTP/1.1..Host: images.steve.org.uk..If-Modified-Since: Mon, 07
   Jun 2010 15:24:33 GMT..User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-U
  S; rv:1.9.1.10) Gecko/20100701 Iceweasel/3.5.10 (like Firefox/3.5.10)..Acce
  pt: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept
  -Language: en-us,en;q=0.5..Accept-Encoding: gzip,deflate..Accept-Charset: I
  SO-8859-1,utf-8;q=0.7,*;q=0.7..Referer: http://images.steve.org.uk/2009/11/
  20/img_0471.html..X-Forwarded-For: 127.0.0.1, 11.22.33.123..Cache-Control:
  max-age=0..X-Forwarded-Host: images.steve.org.uk..X-Forwarded-Server: image
  s.steve.org.uk..Connection: Keep-Alive....

I bolded the important input; just in case that didn't jump out it was:

X-Forwarded-For: 127.0.0.1, 11.22.33.123

My patch to thttpd was making it read the first address, rather than the second - which meant that requests were being logged as coming from 127.0.0.1 and avoiding my efforts to track sources.

Now I understand the problem - The X-Forwarded-Host header is being tweaked by a proxy server, such as Squid, upstream of my server.

For the moment I've updated the thttpd patch to read:

        else if ( strncasecmp( buf, "X-Forwarded-For:", 16 ) == 0 )
          { char *tmp = NULL;

            /* Jump to the header-value  */
            cp = &buf[16];
            cp += strspn( cp, " \t" );

            /*
             * If the first change is a 127.0.0.1, then we'll
             * jump over it.  Cope with Squid, et al.
             */
            if (  ( tmp = strstr( cp, "127.0.0.1, " ) ) != NULL )
              cp = tmp + strlen( "127.0.0.1, " );

            /* Parse the IP */
           inet_aton( cp, &(hc->client_addr.sa_in.sin_addr) );
        }

That's not perfect, but the alternative would be:

  • Install a patched version of libapache2-mod-rpaf to add a X-HONEST-REMOTE-IP
  • Update thttpd to use that header.

Or something equally hacky and security-by-obscurity-alike.

Really I just want a simple way of always getting the correct remote IP. Shouldn't be so hard, should it? *pout*.

ObQuote: "You don't mess with fate, Peanut. People die when they are meant to die. There's no discussion. There's no negotiation. When life's done, it's done." - Dead Like Me.

 

You are in a maze of twisty little passages, all alike

Debian package building scripts consist of several inter-related tools, which are documented well with manpages, but not in the code itself.

Should you have some free time I urge you to take a peak at the source code of dpkg-buildpackage, debsign, and debuild. They work. They work well, but they're not shining examples of clear code are they?

Given that they work, and that they are complex I'm not sure I'm interested in improving things, but I do have a personal desire to run something like:

debuild -sa --output=/tmp/results/

What would that do? Instead of placing the resulting package in the parent directory, which is what we have by default, it would place them in the named directory.

However looking over the code I see that there are too many places where "../$file" is hardwired. e.g. Take a look at dpkg-buildpackage and you see:

my $chg = "../$pva.changes";
..
open OUT, '>', $chg or syserr(_g('write changes file'));

If you change that to the simple code below it works - but suddenly debsign fails to find the file:

my $dir = $ENV{'OUTPUT'} || "..";
my $chg = $dir . "/" . $pva . ".changes";

Ugh.

I guess I can come up with some hack to index files in the parent, run a build process, and move the only new files into place. But that's sleazy at best, and still doesn't solve the problem that we have tools which we rely upon which are .. unfriendly to changes.

I thought I could handle this via:

debuild \
 --post-dpkg-buildpackage-hook="move-changes-contents /tmp/output/ ../%p_%v_`dpkg-architecture -qDEB_HOST_ARCH`.changes" \
 --no-tgz-check -sa -us -uc

(Here "move-changes-contents" would read the given .changes file and move both it and the contents listed in it to the specified directory.)

Unfortunately %v corresponds to the package version - which isn't the same as the version in the .changes filename. (i.e. epochs are stripped out.)

ObQuote: I am the servant of the power behind the Nothing - The Neverending Story.

 

As promised a new blogspam.net

A while back I mentioned that I was going to be updating and overhauling the blogspam.net service. That process is now almost complete. A couple of nights ago I overhauled the website, and today I've finally commited my last (planned) change to the repository for the purposes of migration. I started reworking the code a week or so ago, but as of this evening the code in the repository is the code the server is actually running.

The previous codebase was functional but a little hasty - and was implemented before I switched to per-UID server-hosting - so there was a need to clean things up and make sure permissions and similar niggles were checked.

The new, modular, codebase requires no root access, and will store all state (logs & transient caches) in a clean extensible fasion. The code is also much more flexible making use of Module::Pluggable rather than Class::Pluggable. This allowed me to overhaul the API of the plugins (primarily to add an expire method such that each plugin has a well-defined means to expire any state they may maintain). Module::Pluggable is a great module - allows me to treat plugins as first class objects, which wasn't the case with C::P.

Since all the code behind the service is Perl it is also now available on CPAN in addition to the mercurial repository where it is developed..

I see that the server is getting pretty popular these days, used by the likes of embedders.org, publiclive.com, & etc. It doesn't hurt that ikiwiki, identi.ca, and other people include support in their distributions these days. Me? I mostly use it on debian-administration.org where it does a great job.

ObQuote: What's the name of that thing that if I eat it real fast, it's free? - Whip It.

 

Sanity testing drives

Recently I came across a situation where moving a lot of data around on a machine with a 3Ware RAID card ultimately killed the machine.

To test the hardware in advance for this requires a test of both:

  • The individual drives, which make up the RAID array
  • The filesystem which is layered upon the top of it.

The former can be done with badblocks, etc. The latter requires a simple tool to create a bunch of huge files with "random" contents, then later verify they have the contents you expected.

With that in mind:

dt --files=1000  --size=100M [--no-delete|--delete]

This:

  • Creates, in turn, 1000 files.
  • Each created file will be 100Mb long.
  • Each created file will have random contents written to it, and be closed.
  • Once closed the file will be re-opened and the MD5sum computed
    • Both in my code and by calling /usr/bin/md5sum.
    • If these sums mis-match, indicating a data-error, we abort.
  • Otherwise we delete the file and move on.

Adding "--no-delete" and "--files=100000" allows you to continue testing until your drive is full and you've tested every part of the filesystem.

Trivial toy, or possibly useful to sanity-check a filesystem? You decide. Or just:

hg clone http://dt.repository.steve.org.uk/

(dt == disk test)

ObQuote: "Stand back boy! This calls for divine intervention! " - "Brain Dead"

 

OepnBSD rocks. Until it doesn't.

Recently I've been jumping upon the LDAP bandwagon, with one of my aims to consolidate a lot of different login systems.

Configuring Linux, Apache, OpenVPN and similar things to authenticate against an LDAP server was almost painless.

Unfortunately OpenBSD is being a bit more painful, primarily because it doesn't use PAM. Instead you have two choices:

  • Configure login to authenticate against a RADIUS server, telling that server to authenticate against a (remote) LDAP server.
  • Use login_ldap to do authentication, but fetch all things via YP.

Neither solution is particularly pleasant, but the former is marginally less effort. The downside? I still have to run "adduser" to add the user to the system - which makes me think "why did I bother in the first place?"

Otherwise I spent the tail end of last week in York, taking pictures of ducks, geese, the city walls and similar things of fun.

ObQuote: "Well, well I see we have visitors... " - Hot Fuzz

 

I've accidentally written a replication-friendly filesystem

This evening I was mulling over a recurring desire for a simple, scalable, and robust replication filesystem. These days there are several out there, including Gluster.

For the past year I've personally been using chironfs for my replication needs - I have /shared mounted upon a number of machines and a write to it on any will be almost immediately reflected in the others.

This evening, when mulling over a performance problem with Gluster I was suddenly struck by the idea "Hey, Redis is fast. Right? How hard could it be?".

Although Redis is another one of those new-fangled key/value stores it has several other useful primitives, such as "SETS" and "LISTS". Imagine a filesystem which looks like this:

 /
 /srv
 /tmp
 /var/spool/tmp/

Couldn't we store those entries as members of a set? So we'd have:

  SET ENTRIES:/              -> srv, tmp, var
  SET ENTRIES:/var/spool     -> tmp
  SET ENTRIES:/var/spool/tmp -> (nil)

If you do that "readdir(path):" becomes merely "SMEMBERS entries:$path" ("SMEMBERS foo" being "members of the set named foo"). At this point you can add and remove directories with ease.

The next step, given an entry in a directory "/tmp", called "bob", is working out the most important things:

  • Is /tmp/bob a directory?
    • Read the key DIRECTORIES:/tmp/bob - if that contains a value it is.
  • What is the owner of /tmp/bob?
    • Read the key FILES:/tmp/bob:UID.
  • If this is a file what is the size? What are the contents?
    • Read the key FILES:/tmp/bob:size for the size.
    • Read the key FILES:/tmp/bob:data for the contents.

So with a little creative thought you end up with a filesystem which is entirely stored in Redis. At this point you're thinking "Oooh shiny. Fast and shiny". But then you think "Redis has built in replication support..."

Not bad.

My code is a little rough and ready, using libfus2 & the hiredis C API for Redis. If there's interest I'll share it somewhere.

It should be noted that currently there are two limitations with Redis:

  • All data must fit inside RAM.
  • Master<->Slave replication is trivial, and is the only kind of replication you get.

In real terms the second limitation is the killer. You could connect to the Redis server on HostA from many locations - so you get a true replicated server. Given that the binary protocol is simple this might actually be practical in the real-world. My testing so far seems "fine", but I'll need to stress it some more to be sure.

Alternatively you could bind the filesystem to the redis server running upon localhost on multiple machines - one redis server would be the master, and the rest would be slaves. That gives you a filesystem which is read-only on all but one host, but if that master host is updated the slaves see it "immediately". (Does that setup even have a name? I'm thinking of master-write, slave-read, and that gets cumbersome.)

ObQuote: Please, please, please drive faster! -- Wanted

 

Sometimes you just wonder would other people like this?

Sometimes I write things that are for myself, and later decide to release on the off-chance other people might be interested.

I've hated procmail for a long time, but it is extremely flexible, and for the longest time I figured since I'd got things working the way I wanted there was little point changing.

When it comes to procmail there are few alternatives:

Unfortunately both Exim and Email::Filter suffer from a lack of "pipe" support. To be more specific Exim filters and Email::Filter allow you to pipe an incoming message to an external program - but they regard that as the end of the delivery process.

So, for example, you cannot receive a message (on STDIN), pipe it through crm114, then process that updated message. (i.e. The output of crm114).

Maildrop does allow pipes, but suffers from other problems which makes me "not like it".

My own approach is to have a simple mail-sieve command which is configured thusly:

set maildir=/home/steve/Maildir
set logfile=/home/.trash.d/mail-sieve.log

#
#  Null-senders
#
Return-Path: /<>/        save .Automated.bounces/

#
#  Spam filter
#
filter /usr/bin/crm -u /home/steve/.crm /usr/share/crm114/mailreaver.crm

#
#  Spam?
#
X-CRM114-Status: /SPAM/   save .CRM.Spam/
X-CRM114-Status: /Unsure/ save .CRM.Unsure/


#
#  People / Lists
#
From: /foo@example.com/  save .people.foo/
From: /bar@example.com/  save .people.bar/
..
..

#
#  Domains
#
To: /steve.org.uk$/               save .steve.org.uk/
To: /debian-administration.org$/  save .debian-administration.org.personal/

#
#  All done.
#
save .inbox.unfiled/

On the one hand this is simple, readable, and complete enough for myself. On the other hand if I were going to make it releasable I think I'd probably want to add both conditionals and the ability to match upon multiple header values.

Getting there would probably involve something like this on the ~/.mail-filter side :

if ( ( From: /foo@example.com ) ||
     ( From: /bar@example.com ) )
{
   save .people.example.com/
   exit
}
# ps. remind me how much I hate parsers and lexers?

That starts to look very much like Exim's filter language, at which point I think "why should I bother". Pragmatically the simplest solution would be to add a "Filter" primitive to Email::Filter - and pretend I understood the nasty "Exit" settings.

ObQuote: Andre, we don't use profanity or double negatives here at True Directions. - "But I'm a Cheerleader".

 

sysadmin.im considered harmful

Not for the first time I find my blog content copied and hosted elsewhere. This time via http://sysadmin.im/.

Mostly I care little if people rehost my content. But when people claim to have written it (e.g. "Posted by Admin") I get annoyed.

No explicit contact details are posted, probably to avoid complaints.

Update: Fixed URL. Stupid do.tted.na.mes.

 

My router serves Debian installer via gPXE

My Linksys router now serves the LAN with netboot images, allowing the simple installation of Debian GNU/Linux.

I updated /etc/dnsmasq.conf on the router itself to read:

# gPXE sends a 175 option.
dhcp-match=gpxe,175

# serve the "undionly.kpxe" file by default.
dhcp-boot=net:#gpxe,undionly.kpxe

# which will then pull the config via HTTP
dhcp-boot=http://static.steve.org.uk/netboot/gpxe.cfg

Then placed /tftproot/undionly.kpxe in place on the router. This combination of files causes:

  • The machine to netboot via undionly.kpxe (think of this like pxelinux.0.)
  • The file gpxe.cfg to be fetched via HTTP
  • Which in turn loads a simple menu.conf file, again via HTTP
  • At this point the user can select the flavour of Debian installer to run.

I have only one problem - it seems that adding the expected entry to boot locally fails:

MENU TITLE Netboot

LABEL Local.Disk
   bootlocal 0

Update: After a fresh sleep and more trial and error I discovered this works:

LABEL Local.Boot
   KERNEL local/chain.c32
   APPEND -

It gives a temporary warning about invalid sector number - but actually does boot locally by default, on both my EEEPC and my desktop machine.

If you want to snarf my netboot environment you are welcome to mirror :

Why did I do this in the first place? I wanted to install Squeeze upon my EEEPC (now done.)

ObQuote: Nice Cross. How'd you get the blood off?

- Dead Like Me