Useful Sed One-Liners for MUSHers

So I’ve got a lot of logs, which are a pain in the ass to clean, especially when there are especially epic amounts of OOC chatter, conversations between people in pages, and just the things that pop up in your sessions that aren’t poses.  Regular expressions make it possible to make the large majority of those not a problem, but have a very high learning curve and are very difficult to read.

Here’s some of the commonly-used regular expressions that I use, expressed as sed statements that will read the input file, parse it and write it to the ‘output’ file in the same directory. They should work pretty much anywhere sed is installed, with the exception of the statement that handles tab characters, which won’t run under Mac OS X.  For some reason that is totally beyond me, Apple includes a deprecated version of sed that doesn’t handle tab characters properly.

If you’re on Windows, you can use Cygwin or get a shell account somewhere.

If you want to learn more about sed, try this.  Its a pretty good resource.

Delete Everything To Start Of Scene

The whole mess of crap that you do when you sign in is hard to mitigate with regular expressions.  While it’s theoretically possible to eliminate most of it with regex that eliminates thing between lines with strings of equal signs, and regex that delete lines starting with symbols, sed’s multi-line capabilities are something of a mystery to me, so I do something a little bit easier.

At the start of a scene, I stick in a line saying “START_SCENE”, then I run this expression to delete everything in the log up until that scene:

sed -e  '1,/^START_SCENE/d' inputFile >> output

Remove Leading Whitespace

This one eliminates all the leading whitespace in a line–this means blank space and tabs.  This is useful when people put leading whitespace on their poses, tabs or otherwise (someone on SWMUSH that I often RP with does this in literally every single pose).

sed -e 's/^[ \t]*//' inputFile >> output

Note that this might not be what you want in the end result–I, for one, perfer to see subsequent paragraphs of individual poses indented but without a line break from the previous line–but once I’ve formatted all the poses into individual paragraphs, there’s another expression that we can use to reset the indentation.

Also note that by default, the version of sed that ships with OS X by default will not handle the tab character properly.  To get it to work, you’ll have to either get an updated version of sed via Fink or run it on another machine or shell account somewhere.

Remove OOC Chatter

This isn’t a difficult regex to grasp, but it serves an important function.What would first come to mind would be to do this:

sed -e '/<*.>/d' inputFile >> output

For the uninitiated, the <*.> describes one or more of anything–letters, numbers, etc.  This would delete lines containing both OOC chatter and channel chatter.  Unfortunately, problems arise when people use skills or need to do rolls:

<SKILL> Tokoga has rolled 14 on his Dodge check.  An average roll!

This means we need what cool people call pattern exclusion.  What’s really wanted here is to get all lines that start with a < to be deleted, except for those where SKILL or ROLL directly follows it:

sed -e '/^<[^(ROLL)]*[^(SKILL)]*>/d' inputFile >> outputFile

Remove Connects and Disconnects

The above statement will cut out the channel chatter:

<Welcome> Tokoga has disconnected.

But it won’t have an effect if he’s in the same the same room, since the disconnect emit doesn’t have a perpending < symbol:

Tokoga has disconnected.

So we’ll need a special statement to match lines of the form “* has disconnected” and another one to match lines like “* has connected.”.  We’ll also need one to handle reconnects and partial disconnects.

sed -e '/^.* has disconnected./d' inputFile >> output
sed -e '/^.* has connected./d' inputFile >> output

sed -e '/^.* has partially disconnected./d' inputFile >> output

sed -e '/^.* has reconnected./d' inputFile >> output

Remove Pages

This is actually a little bit more complicated because of the way that pennmush handles pages.  We’re going to need a disjoint for you paging someone and someone else paging you.

sed -e '/^From afar, *./d' inputFile >> output

The next one handles you paging people:

sed -e  '/^You paged *./d inputFile >> output

Indentation of Subsequent Paragraphs of Poses

Since all the extra whitespace and unnecessary crap has already been removed, we have something that we can start to work with.

Now, the only thing left to do is to put paragraphs of poses together.  For each person’s pose, if there needs to be another paragraph, don’t insert a blank between paragraphs of the same poses.  This is the part that you’re going to have to do manually, but fortunately his is really the only task you’ll have to do as a human–it can’t be automated, obviously, because the computer can’t distinguish between one persons pose and another.

Double Space

The last step in the sequence will produce a series of poses separated by a double blank line, with multi-line separated by a single blank line.  Skill checks and rolls are also separated with a blank line before and after.  This step uses the double space command.

sed -e G inputFile >> output

This has the effect of double spacing all the output.  If you’ve done it right, this will leave you with two lines between poses of separate people and a single line between subsequent paragraphs of the same pose.  This raw formatting helps to preserve the pose-centric nature of the scene.  Indentation isn’t handled particularly well in MediaWiki, so we omit that, and instead use a single newline to separate those.

Posted in Uncategorized at June 24th, 2010. No Comments.

Oh hai.

Posted in Cats at March 15th, 2010. No Comments.