A Look at Finding Out-of-Date Files

On occasion, I need a script that will updates reports but only when the sources have changed. This is typically creating an XML file from log files or creating a static page for a web site. Here is a function to determine if the sources have changed after the last time the report was created.

Writing the Function

The algorithm is simple enough, just compare the mtime of the sources to that of the target. But it's the error processing that needs some thought. The function should respond to:

Some of the sources are missing.

If not all the sources are there, the target cannot be made. The function should throw an exception for this.

The target file does not exists.

Indicate that the target file needs to be created.

With these in mind, it's time to write the function. First the documentation header:

  # --------------------------------------
  #       Name: out_of_date
  #      Usage: $is_out_of_date = out_of_date( $target, @sources );
  #    Purpose: To determine if the target file is out of date with respect to the sources.
  # Parameters:         $target -- Full path to target file.
  #                    @sources -- Full path to the sources files.
  #    Returns: $is_out_of_date -- TRUE if the target should be recreated.
  #
  sub out_of_date {
      my $target  = shift @_;
      my @sources = @_;

Now to check for missing sources:

      # check if all sources available
      my @missing = ();

      for my $source ( @sources ){

          # if the source file does not exists, add its name to the missing list
          if( ! -e $source ){
              push @missing, $source;
          }

      } # end for $source

      # if any missing, throw an exception
      if( @missing ){
          croak "out_of_date(): could not find the source(s): @missing";
      }

If the subroutines get to this point, all the sources exist. Now check the target: if it doesn't exist, it's out of date.

      # check if target exists
      if( ! -e $target ){
          return 1; # not exists == out of date
      }

Get the target's mtime from stat(). mtime is at index 9.

      # save the target's mtime
      my $target_mtime = (stat( $target ))[9];

Loop through the sources and compare their mtime. If one is greater than the target's mtime, the target is out of date. That is, the more recent the file, the greater its mtime.

      # compare mtimes of sources to target's
      for my $source ( @sources ){
          if( (stat( $source ))[9] >= $target_mtime ){
              return 1;
          }
      }

And finally, return FALSE because the target must be more recent than the sources.

      # target must be up to date.
      return 0;
  }

Writing the Tests

Creating the tests for this function once again starts in the test directory:

  $ cd ~/perl5/lib/t/MyUtils
  $ >02-out_of_date.t
  $ chmod a+x 02-out_of_date.t

And then edit the file with your favourite editor.

  #!/usr/bin/env perl

  use strict;
  use warnings;

  use English       qw( -no_match_vars );  # Avoids regex performance penalty

  use Test::More;
  BEGIN{ use_ok( 'MyUtils' ); } # test #1: check to see if module can be compiled
  my $test_count = 1; # 1 for the use_ok() in BEGIN

  use MyUtils qw( out_of_date ); # import the out_of_date() function

Now, we're going to need to create some files. Fortunately, Perl provides some standard modules that allows us to create some temporary files without messing things up (too much). The END block is there to remove the test files, if any, when the program is done.

  use File::Basename;
  ( my $self = basename( $0 )) =~ s{ \. .* \z }{}msx;;

  use File::Spec;
  my $tmp_dir = File::Spec->tmpdir();

  my $source_1 = "$tmp_dir/${self}_source_1_$PID.tmp";
  my $source_2 = "$tmp_dir/${self}_source_2_$PID.tmp";
  my $source_3 = "$tmp_dir/${self}_source_3_$PID.tmp";
  my $target   = "$tmp_dir/${self}_target_$PID.tmp";

  # remove the files when done
  END {
      unlink $target, $source_1, $source_2, $source_3;
  }

The first test will run without any of the files existing to test the exception. It isolates the $EVAL_ERROR by doing the test in a block. This is not strictly necessary but it's a good habit to get into: to isolate and encapsulate to prevent cross contamination.

Note that the actual response is trimmed to the message created in the function. This is because croak() adds where the error occurs, including the full path to the file of the calling sub. Since this will be different on different machines, it has to be discarded.

  {
      my $expected = "out_of_date(): could not find the source(s): $source_1 $source_2 $source_3",

      local $EVAL_ERROR;
      eval {
          my $is_out_of_date = out_of_date( $target, $source_1, $source_2, $source_3 );
      };
      my $actual = substr( $EVAL_ERROR, 0, length( $expected ));

      is( $actual, $expected, "all sources missing" );
      $test_count ++;
  }

For the second test, some of the source files will be created but not all. Again, this is to test the exception.

  # Create some of the source files
  open my $fh, '>', $source_1 or die "could not open $source_1: $OS_ERROR\n";
  close $fh;
  open $fh, '>', $source_3 or die "could not open $source_3: $OS_ERROR\n";
  close $fh;

  # test 2: some sources missing
  {
      my $expected = "out_of_date(): could not find the source(s): $source_2",

      local $EVAL_ERROR;
      eval {
          my $is_out_of_date = out_of_date( $target, $source_1, $source_2, $source_3 );
      };
      my $actual = substr( $EVAL_ERROR, 0, length( $expected ));

      is( $actual, $expected, "all sources missing" );
      $test_count ++;
  }

Now to create the remaining source files and test the function. The process is put to sleep for two seconds so that the mtimes will be different;

  # create the remainder of the source files
  open $fh, '>', $source_2 or die "could not open $source_2: $OS_ERROR\n";
  close $fh;

  # test 3: target missing
  my $actual = out_of_date( $target, $source_1, $source_2, $source_3 );
  is( $actual, 1, "target missing" );
  $test_count ++;

Create the target and test. The process is put to sleep for two seconds so that there is a measurable difference in the mtimes.

  # create the target file
  sleep 2;
  open $fh, '>', $target or die "could not open $target: $OS_ERROR\n";
  close $fh;

  # test 4: target is up to date
  $actual = out_of_date( $target, $source_1, $source_2, $source_3 );
  is( $actual, 0, "target is up to date" );
  $test_count ++;

One of the source files is update. Again, the process is put to sleep for two seconds so that there is a measurable difference in the mtimes.

  # update one of the source files
  sleep 2;
  open $fh, '>', $source_1 or die "could not open $source_1: $OS_ERROR\n";
  close $fh;

  # test 5: target is out to date
  $actual = out_of_date( $target, $source_1, $source_2, $source_3 );
  is( $actual, 1, "target is out to date" );
  $test_count ++;

Finally, tell Test::more we are done testing.

  # tell Test::More we're done
  done_testing( $test_count );

Comments