A Look at Cartesian Products

Copyright 2012 by Shawn H Corey. Some rights reserved.
Licence under CC BY-SA 3.0

Problem: You have a number of sets and you want all the combinations when choosing one element from each set.

In mathematics, these combinations are called the Cartesian product. They are also known as cross-products. In the database world, they are sometimes called cross-joins.

This article is to show how to create them using Perl's glob function and how to create them in a subroutine.

Perl's glob Function

Perl's glob uses csh's wildcard expansion. When given a list, it will insert each item from the list, one at a time, into the specification. For example:

    $ perl -e'@a=glob(q[g{a,e,i,o,u}t]);print "@a\n"'
    gat get git got gut

This ability can be used to create Cartesian products:

    $ perl -e'@a=glob(q[{foo,bar}{1,2,3}]);print "@a\n"'
    foo1 foo2 foo3 bar1 bar2 bar3

A Subroutine Using Recursion

Cartesian products are easy to do with recursion:

    #!/usr/bin/env perl

    use 5.16.0;
    use strict;
    use warnings;

    use Data::Dumper;
    use Storable qw( dclone );

    # Make Data::Dumper pretty
    $Data::Dumper::Sortkeys = 1;
    $Data::Dumper::Indent   = 1;

    # Set maximum depth for Data::Dumper, zero means unlimited
    local $Data::Dumper::Maxdepth = 0;

    # --------------------------------------
    #       Name: cartesian_product
    #      Usage: $products = cartesian_product( \@sets );
    #    Purpose: Generate the Cartesian products of the given sets via recursion.
    # Parameters:    \@sets -- an AoA of the sets
    #    Returns: $products -- an AoA of the products
    #
    sub cartesian_product {
      my $sets = shift @_;

      # step 1: handle the degenerate case(s)
      return [[]] unless @$sets;

      # step 2: reduce the problem by one
      my $first_set = shift @$sets;

      # step 3: recurse to get the partial results
      my $partial_products = cartesian_product( $sets );

      # step 4: combine the partial results to get the full results
      my $products = [];
      for my $item ( @$first_set ){
        for my $product ( @$partial_products ){
          push @$products, [ $item, @$product ];
        }
      }

      return $products;
    }

    # --------------------------------------
    # Main

    my @sets = (
      [qw( foo bar )],
      [qw( 1 2 3 )],
    );

    my $products = cartesian_product( dclone( \@sets ));

    print Dumper \@sets, $products;

Main

The main part of this script creates some sets for testing similar to those used in the previous example. It uses dclone() from Storable to copy the sets and calls the subroutine to create the Cartesian products. This is necessary since the subroutine is destructive in its use of the sets.

Cartesian Product Subroutine

The subroutine follow the four steps of recursion.

  1. The first step handles the degenerate case. If there are no sets, an array holding an empty array is returned.
  2. The second set saves the first set.
  3. The third step calls the subroutine recursively. It stores the partial results.
  4. The final step is to combine the saved set with the partial results. This is done with a loop within a loop to create new sets by adding each element of the saved set to the sets of the partial results.

A Subroutine Using Loops

    #!/usr/bin/env perl

    use 5.16.0;
    use strict;
    use warnings;

    use Data::Dumper;

    # Make Data::Dumper pretty
    $Data::Dumper::Sortkeys = 1;
    $Data::Dumper::Indent   = 1;

    # Set maximum depth for Data::Dumper, zero means unlimited
    local $Data::Dumper::Maxdepth = 0;

    # --------------------------------------
    #       Name: cartesian_product
    #      Usage: $products = cartesian_product( \@sets );
    #    Purpose: Generate the Cartesian products of the given sets via looping.
    # Parameters:    \@sets -- an AoA of the sets
    #    Returns: $products -- an AoA of the products
    #
    sub cartesian_product {
      my $sets = shift @_;

      my $products = [[]];
      for my $set ( reverse @$sets ){

        my $partial_products = $products;
        $products = [];

        for my $item ( @$set ){
          for my $product ( @$partial_products ){
            push @$products, [ $item, @$product ];
          }
        }

      }

      return $products;
    }

    # --------------------------------------
    # Main

    my @sets = (
      [qw( foo bar )],
      [qw( 1 2 3 )],
    );

    my $products = cartesian_product( \@sets );

    print Dumper \@sets, $products;

Main

Again, the main in this script creates a set of sets for demonstrate but unlike the recursion version, this one does not need to clone it. This subroutine is non-destructive.

Cartesian Product Subroutine

The first step of this subroutine is to assume there are no sets and the products are assign an array holding only an empty array. Then the sets are scanned in reverse order. This is just to ensure the result are the same as the previous script.

The partial products are saved and a new array is created to hold the new products.

The inner two loops that follow are the same as those in the previous script. They create the final products for the set.

Conclusion

Cartesian products at first may seem daunting but all that is need is two loops and a little recursion. Or three loops, if with a little more brain power.

Comments

  1. Al alternative approach can be found in my module, https://metacpan.org/module/Set::CartesianProduct::Lazy, which can be quite useful in certain cases, especially when the cartesian product could be substantially larger than the "source" data.

    ReplyDelete

Post a Comment