Web::Chain project:    Web/Chain/IO.pm


     package Web::Chain::IO;
#                                doom@kzsu.stanford.edu
#                                29 Aug 2004

=head1 NAME

Web::Chain::IO - provides input/output handle for chains of doomfile nodes

=head1 SYNOPSIS

   use Web::Chain::IO;

   my $dfh = Web::Chain::IO->new; # Create a doomfiles io handle

   $dfh->input_location($input_location);  # the current input directory, optional if full path is used.
   $dfh->input_format('Rawtext');   # uses a Web::Chain::IO::Input::Rawtext
   my $file_name_txt = "/home/doom/Thought/MEANDERINGS";
   my $chain = $dfh->input($file_name_rawtext);
   # Note: input (above) creates and returns a chain object.

   use Web::Chain;
   my $chain = Web::Chain->new;   # Create a chain object to store (and manipulate) data

   $dfh->output_location($location_html);
   $dfh->output_format('Html');     # dynamically loads Web::Chain::IO::Html
   $dfh->output($chain);

   $dfh->input_location($location_html); 
   $dfh->input_format('Html');
   $chain = $dfh->input($begining_node_name, $end_node_name);  # ellide end node for all

   # Get a listing of the existing DF nodes already in html webpage form:
   my $dfh = Web::Chain::IO->new; # Create a chain io handle
   $dfh->input_location($input_location);  # the current input directory
   $dfh->input_format('Html');   # dynamically loads Web::Chain::IO::Html
   my $node_names_ref = $dfh->get_browse_sequence_from_input;
   foreach ( @{ $node_names_ref } ) { print "$_\n"; }

   @data_formats = $dfh->list_data_formats();     # e.g. Rawtext, Html...



=head1 DESCRIPTION

Web::Chain::IO is a wrapper object that let's you do file 
input and output of different formats on a Web::Chain 
structure.

Internally this module uses a pair of input and output handles 
the types of which are determined at run time.

Different file formats are handled by handles derived from different 
input or output classes.  

The IO object (called "$dfh" in the examples in the SYNOPSIS) 
largely serves just to pass on requests to the appropriate 
format-specific handler. 

=head1 METHODS

=over

=cut

use 5.006;
use strict; 
use warnings;
use Carp;
use File::Basename qw( dirname );
use Web::Pro::Util qw( module_path );
use Web::Definitions qw( $DF_VERSION $DEBUG );

our $VERSION = $DF_VERSION;

=item B<new> - creates a new chain IO object, a "doomfiles handle" for 
  the actual input or output

=cut

sub new {  
   my $class = shift;
#  my $chain = shift if $_[0]; # optional argument
  ($DEBUG) && print STDERR "Creating new doomfiles chain IO object\n";

  bless { 
#         _chain          => undef,   # chain object
         _input_handle   => undef,   # Chain::IO::<format> object
         _output_handle  => undef,   # Chain::IO::<format> object
         _input_location => undef,   # input data format handle can use this if it wants
         _output_location => undef,  # output data format handle can use this if it wants
        } , $class
}

=back

=head2 SIMPLE MUTATORS

These set and get the "location" values that may or may
not get used by the different possible input and output
routines for the different data formats.  It's up to them,
if and how.

=over 

=cut

=item input_location

=cut

sub input_location {
   my ($self, $loc) = @_;
   my $subname = ( caller(0) )[3];
   $self -> {_input_location} =  $loc  if $loc ;
   return( $self->{_input_location} );
}

=item output_location

=cut

sub output_location {
   my ($self, $loc) = @_;
   my $subname = ( caller(0) )[3];
   $self -> {_output_location} =  $loc  if $loc ;
   return( $self->{_output_location} );
}

=back 

=head2 FUNNY MUTATORS

The "funny mutators" here do something a little unusual:
they manage the "doomfiles" input and/or output handles 
that are stored inside the IO objects (e.g. $dfh).

This is a way of using aggregation to get polymorphic
behavior for the "input" and "output" methods, to allow
i/o of different data formats through the same interface.

(The goal is to be able to use $dfh->output and $dfh->input 
for any data format of interest.)

So these "funny mutators" internally do requires of
another module, an input or output class which is
dynamically defined.  

It's possible to have different IO handles in use
concurrently that use different data formats, and it's
also possible to change the format a given IO handle will
use on the fly.

I expect that it will be a common task to read in a
"doomfiles" chain from the published Html, and also read
in a small chain of new material in the Rawtext format,
then to merge the two of them and output revised Html
again (over-writing some of the original source Html).

=over 

=cut

=item output_format

=cut

sub output_format {
   my ($self, $format) = @_;
   my $subname = ( caller(0) )[3];

   if ($format) { 
     my $parent = ref($self);

     # # An alternate method:  (TODO - is this identical? Subclass test.)
     # my $parent = $subname;
     # $parent =~ s/::[^:]*$//;  

     my $output_class = $parent . '::' . $format;

     eval "require $output_class";
     if ($@) {
        croak "$subname: require of $output_class failed: $@";
     }

     my $output_handle = $output_class->new( $self );

     $self->{_output_handle} = $output_handle;
   }

   return( $self->{_output_handle} );
}

=item input_format

=cut

sub input_format {
   my ($self, $format) = @_;
   my $subname = ( caller(0) )[3];

   if ($format) { 

     my $parent = ref($self);

     # # An alternate method:  (TODO - is this identical? Subclass test.)
     # my $parent = $subname;
     # $parent =~ s/::[^:]*$//;  

     my $input_class = $parent . '::' . $format;   ### TODO - maybe a better way?

     ($DEBUG) && print STDERR "requiring $input_class\n";
     eval "require $input_class";
     if ($@) {
        croak "$subname: require of $input_class failed: $@";
     }

     my $input_handle = $input_class->new( $self );
     $self->{_input_handle} = $input_handle;
   }
   return( $self->{_input_handle} );
}

=back

=head2 REAL METHODS (well... real frontends)

These do the real work, though essentially these are just 
fronts for the IO::<format> objects...

=over

=cut

=item B<output> - 
  output must be given a $chain object to work on, 
  the optional second argument is the node name 
  Three ordered arguments:
  (1) a chain object 
  (2) the first node to be output. 
      If undefined start at the begining and continue to the end
      (the third parameter is not used in that case).
  (3) (a) the final node in the sequence to output or 
      (b) the total number of nodes to output or 
      (c) an undefined value: go all the way to the end of the chain. 
   (see input location docs for more details, it's similar). 
   (TODO - write general description of both?  Make this one more detailed?)

=cut

sub output {
   my ($self, $chain, $begin_node, $termination) = @_;
   my $subname = ( caller(0) )[3];

   $self->{_output_handle}->output($chain, $begin_node, $termination);

}

=item B<output_splice> - 
  output_splice must be given a $chain object to be output, 
  and also a second argument, the position in the browse sequence:
  the name of an already existing node in the output location 
  where this chain of nodes will be "spliced" into place. 
  (As usual, we use the name of the node *before* the position 
  we want to indicate.)
  Note that unlike L<output>, it is presumed that the entire chain 
  object will be output, so there is no need for the "begin node" 
  and "termination" argumenst.
  Two ordered arguments:
  (1) a chain object 
  (2) the position in the browse sequence 

=cut

sub output_splice {
   my ($self, $chain, $position) = @_;
   my $subname = ( caller(0) )[3];

   $self->{_output_handle}->output_splice($chain, $position);
}


=item B<input> - input takes the name of something to get
  input from (e.g a file name), and passes it on to the
  input method associated with the internal input handle
  for the desired data format.  The details may differ
  depending on the underlying format that's being input.
  The _input_location field for this object might define
  the directory to look for the item named (or it might be
  some database connection information, or the input
  routine might ignore it entirely).  In the case of
  Rawtext input, the entire chain read in might be present
  in the single named file, but in the case of Html input,
  the chain would be distributed one node per file.  
  TODO -- this needs to be re-written (best to centralize it?)
  The optional second argument is a limit condition, the
  number of nodes to be read (to cover the case where you
  don't want to read in a big chunk of the entire
  collection of nodes).  Once again though, the underlying
  input routine is not required to do anything with that
  parameter.

=cut

sub input {
   my ($self, $name, $howmany) = @_;
   my $subname = ( caller(0) )[3];

       $self->{_input_handle}->input($name, $howmany);
 }

=back 

=head2 probing the browse sequence

The following two routines request either the input or the 
output handle to report on the currently defined 
"browse sequence".  They return a reference to an array
of names. 

=over

=item B<get_browse_sequence_from_input> - returns a reference to the 
  browse sequence of all nodes in the input location.

=cut 

sub get_browse_sequence_from_input { 
   my ($self) = @_;
   my $subname = ( caller(0) )[3];

   $self->{_input_handle}->get_browse_sequence_from_input();
}

=item B<get_browse_sequence_from_output> - returns a reference to the browse sequence
  of all currently existing nodes in the output location. 
 
=cut 

sub get_browse_sequence_from_output { 
   my ($self) = @_;
   my $subname = ( caller(0) )[3];

   $self->{_output_handle}->get_browse_sequence_from_output();
}

=back 

=head2 probing a node's meta_info 

The following two routines look-up a node by name
(from either the input or the output location) returning 
three items, the name of the node, the next node link, 
and the prev node link.    Returning the node
name when node name is also the argument might seem
redundant, but (a) it can be treated as a flag to
determine if the input was sucessful (prev or next would
be undef for the first and last nodes, respectively) and
also (b) where possible the format-specific code will
read the node name in a different way than the lookup is
performed -- e.g. in the case of Html, the returned name is
that of the page title, but the given name is the file name. 

=over 

=item B<input_meta_info_from_input> - given a node name, gets meta 
   information for a node located in the input location.
   Example usage:
     ($node_name, $next_name, $prev_name) = $io_handle->input_meta_info_from_input($node_name)

=cut 

sub input_meta_info_from_input { 
   my ($self, $name) = @_;
   my $subname = ( caller(0) )[3];

   $self->{_input_handle}->input_meta_info_from_input($name);
}

=item B<input_meta_info_from_output> - given a node name, gets meta 
   information for a node located in the output location.
   Example usage:
     ($node_name, $next_name, $prev_name) = $io_handle->output_meta_info_from_output($node_name)

=cut 

sub input_meta_info_from_output { 
   my ($self, $name) = @_;
   my $subname = ( caller(0) )[3];

   $self->{_output_handle}->input_meta_info_from_output($name);
}


=back

=head2 Utilities to list available formats

Methods that check what data format modules are available to 
be dynamically loaded by input_format or output_format.

### TODO 
### Would it be better to search all locations in @INC?
### Currently I just assume they're located just below this level:
###    $this_file_location/IO/<format>.pm

### TODO 
### Something like FindBin for a module location?
### (I wrote a module_name that uses caller.  
### Could it be that FindBin does this already?)

=over

=item B<list_data_formats> - lists all Web::Chain::IO::* modules
   (except for Web::Chain::IO::Common, which isn't a format).
   Note that these are just the format names, not the full 
   module name: most likely you will prepend 'Web::Chain::IO::' 
   to get the module name.

=cut

sub list_data_formats {
   my ($self, $name) = @_;
   my $subname = ( caller(0) )[3];

   ($DEBUG) && print STDERR "module path: " . module_path() . "\n";

   my $module_path = module_path();
   my $format_loc = $module_path . '/IO';
   unless (-d $format_loc) { croak "Oddly enough $format_loc doesn't exist."; }
   chdir($format_loc);
   my @formats = grep {!/Common/} map{ s/\.pm$//; $_ } <*.pm>;
   return @formats;
}

=back

=head2 Experimental routines that do some format independant 
  processing useful to the format dependant code. 

=over

=item B<find_first_node_from_input> - get_browse_sequence may be asked 
  to work with an existing df project without the standard 
  begin and end nodes (TOP and FIN).  If TOP doesn't exist
  in that case, there's a need to be able to find the actual 
  beginning node in the sequence, starting from anywhere 
  (a random node name, here called the "seed").  If the 
  given $seed value does not exist, this will return undef.
  Example usage:
    ($seed = $file[0]) =~ s/\.html$//;
    $begin_node_name = $self->find_first_node($seed, $location);               

=cut

sub find_first_node_from_input {
  my ($self, $seed, $location) = @_;
  my $subname = ( caller(0) )[3];
  my ($node_name, $name_h1, $prev, $next);
  $prev = $seed; # initiliaze
  do { 
    $node_name = $prev;
    ($name_h1, $prev, $next) = $self->input_meta_info_from_input( $node_name );
    unless ( $name_h1 ) { # if name is not defined, we've probably been given a bad seed,
      return undef; # so we return undef to indicate there's something wrong.
    }
  } until (not( $prev ));
  return $node_name; 
}

=item B<find_first_node_from_output> - get_browse_sequence may be asked 
  to work with an existing df project without the standard 
  begin and end nodes (TOP and FIN).  If TOP doesn't exist
  in that case, there's a need to be able to find the actual 
  beginning node in the sequence, starting from anywhere 
  (a random node name, here called the "seed").  If the 
  given $seed value does not exist, this will return undef.
  Example usage:
    ($seed = $file[0]) =~ s/\.html$//;
    $begin_node_name = $self->find_first_node($seed);               

=cut

sub find_first_node_from_output {
  my ($self, $seed, $location) = @_;
  my $subname = ( caller(0) )[3];
  my ($node_name, $name_h1, $prev, $next);
  $prev = $seed; # initiliaze
  do { 
    $node_name = $prev;
    ($name_h1, $prev, $next) = $self->input_meta_info_from_output( $node_name );
    unless ( $name_h1 ) { # if name is not defined, we've probably been given a bad seed,
      return undef; # so we return undef to indicate there's something wrong.
    }
  } until (not( $prev ));
  return $node_name; 
}

=item B<log_new_names> - given a reference to an array of node 
  names, include a list of them in a special node reserved 
  for the purpose of logging these additions.  The default name for 
  that node is defined in Web::Definitions as $DF_WHATSNEW_NODE_NAME, 
  it can be overridden with the optional second argument. 
  Example usage: 
     $dfh->log_new_names( \@new_node_names, $log_node );

=cut 

sub log_new_names {
  my ($self, $new_node_list_ref, $log_node) = @_;
  my $subname = ( caller(0) )[3];

  $self->{_output_handle}->log_new_names($new_node_list_ref, $log_node);

}

=item B<generate_contents_node> - re-generate the "contents" 
  node (i.e. the table of contents), a listing of all nodes 
  found in the output location. 
  The default name for that node is defined in
  Web::Definitions as $DF_CONTENTS_NODE_NAME, it can be
  overridden with the optional argument.  This method 
  reads in the existing contents file first, updates it, and 
  writes it out again: this ensures that next and prev links 
  remain intact.  The main body of the node is replaced by 
  a new list describing the current browse sequence.
  Example usage:
   $dfh->generate_contents_node( $log_node );

=cut 

sub generate_contents_node {
  my ($self, $contents_node) = @_;
  my $subname = ( caller(0) )[3];

  $self->{_output_handle}->generate_contents_node($contents_node);

}




1;
__END__

=back

=head1 TODO 

Add something to SYNOPSIS about this:

($node_name, $next_name, $prev_name) = $io_handle->
  input_meta_info($node_name, $location)

($node_name, $next_name, $prev_name) = $io_handle->
  input_meta_info_from_input($node_name)

($node_name, $next_name, $prev_name) = $io_handle->
  input_meta_info_from_output($node_name)



=head1 SEE ALSO

=over 

=item L<Project Documentation|Web::Project>

=item L<Web::Chain>

=item L<Web::Chain::IO::Rawtext>

=item L<Web::Chain::IO::Html>

=back 

=head1 AUTHOR

Joseph Brenner, E<lt>doom@kzsu.stanford.eduE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2004 by Joseph Brenner

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.2 or,
at your option, any later version of Perl 5 you may have available.

=head1 BUGS

None reported... yet.

=cut

     

Joseph Brenner, Sat Nov 6 17:04:11 2004