Let's suppose that you have a huge logfile and would like to quickly extract lines from it that relate to a given small time interval. How would you do it?

Since the lines are ordered by time specification, the fastest way (provided you do not keep indexes of any sort) is to do the old good binary search, doing all necessary housekeeping to account for line boundaries and converting the timestamp from whatever format it is in the logfile to epoch seconds for comparison with the target interval boundaries.

Since you are dealing with Perl here, it would be natural to first look on CPAN for a module which somebody else has already written to do just this.

And of course somebody has. Enter File::SortedSeek by Dr. James Freeman. The module interface is a bit weird, so it pays off to read the documentation carefully.

At any rate, here is a complete program that handles the task, assuming that the timestamp (in pretty much any format) is at the beginning of each line of the logfile:


#! /usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
use File::SortedSeek;
use Time::ParseDate;

my ($from, $to);
usage() unless GetOptions("from=s" => \$from, "to=s" => \$to);
usage() unless @ARGV == 1;
$from = parsedate($from) if $from;
$to   = parsedate($to)   if $to;

my $filename = shift;

open L, "< $filename" or die "unable to open $filename: $!\n";
File::SortedSeek::set_silent(1);

my $end = File::SortedSeek::numeric(*L, $to, \&time2sec)   if $to;
my $beg = File::SortedSeek::numeric(*L, $from, \&time2sec) if $from;
$end ||= 0;  $beg ||= 0;
while (<L>) {
    print;
    $beg += length($_);
    last if $end && $beg > $end;
}

sub usage
{
    print STDERR <<EOF;
usage:
\t$0 --from date-time [--to date-time] filename
\t$0 -f date-time [-t date-time] filename
EOF
    exit 1;
}

sub time2sec
{
    my $line  = shift;
    return undef unless defined $line;
    my $r = parsedate($line, FUZZY => 1);
    $r;
}

Nifty, eh?