Web::Chain project: Web/Chain/IO.pm
package Web::Chain::IO;
# doom@kzsu.stanford.edu
# 29 Aug 2004
=head1 NAME
Web::Chain::IO - provides input/output handle for chains of doomfile nodes
=head1 SYNOPSIS
use Web::Chain::IO;
my $dfh = Web::Chain::IO->new; # Create a doomfiles io handle
$dfh->input_location($input_location); # the current input directory, optional if full path is used.
$dfh->input_format('Rawtext'); # uses a Web::Chain::IO::Input::Rawtext
my $file_name_txt = "/home/doom/Thought/MEANDERINGS";
my $chain = $dfh->input($file_name_rawtext);
# Note: input (above) creates and returns a chain object.
use Web::Chain;
my $chain = Web::Chain->new; # Create a chain object to store (and manipulate) data
$dfh->output_location($location_html);
$dfh->output_format('Html'); # dynamically loads Web::Chain::IO::Html
$dfh->output($chain);
$dfh->input_location($location_html);
$dfh->input_format('Html');
$chain = $dfh->input($begining_node_name, $end_node_name); # ellide end node for all
# Get a listing of the existing DF nodes already in html webpage form:
my $dfh = Web::Chain::IO->new; # Create a chain io handle
$dfh->input_location($input_location); # the current input directory
$dfh->input_format('Html'); # dynamically loads Web::Chain::IO::Html
my $node_names_ref = $dfh->get_browse_sequence_from_input;
foreach ( @{ $node_names_ref } ) { print "$_\n"; }
@data_formats = $dfh->list_data_formats(); # e.g. Rawtext, Html...
=head1 DESCRIPTION
Web::Chain::IO is a wrapper object that let's you do file
input and output of different formats on a Web::Chain
structure.
Internally this module uses a pair of input and output handles
the types of which are determined at run time.
Different file formats are handled by handles derived from different
input or output classes.
The IO object (called "$dfh" in the examples in the SYNOPSIS)
largely serves just to pass on requests to the appropriate
format-specific handler.
=head1 METHODS
=over
=cut
use 5.006;
use strict;
use warnings;
use Carp;
use File::Basename qw( dirname );
use Web::Pro::Util qw( module_path );
use Web::Definitions qw( $DF_VERSION $DEBUG );
our $VERSION = $DF_VERSION;
=item B<new> - creates a new chain IO object, a "doomfiles handle" for
the actual input or output
=cut
sub new {
my $class = shift;
# my $chain = shift if $_[0]; # optional argument
($DEBUG) && print STDERR "Creating new doomfiles chain IO object\n";
bless {
# _chain => undef, # chain object
_input_handle => undef, # Chain::IO::<format> object
_output_handle => undef, # Chain::IO::<format> object
_input_location => undef, # input data format handle can use this if it wants
_output_location => undef, # output data format handle can use this if it wants
} , $class
}
=back
=head2 SIMPLE MUTATORS
These set and get the "location" values that may or may
not get used by the different possible input and output
routines for the different data formats. It's up to them,
if and how.
=over
=cut
=item input_location
=cut
sub input_location {
my ($self, $loc) = @_;
my $subname = ( caller(0) )[3];
$self -> {_input_location} = $loc if $loc ;
return( $self->{_input_location} );
}
=item output_location
=cut
sub output_location {
my ($self, $loc) = @_;
my $subname = ( caller(0) )[3];
$self -> {_output_location} = $loc if $loc ;
return( $self->{_output_location} );
}
=back
=head2 FUNNY MUTATORS
The "funny mutators" here do something a little unusual:
they manage the "doomfiles" input and/or output handles
that are stored inside the IO objects (e.g. $dfh).
This is a way of using aggregation to get polymorphic
behavior for the "input" and "output" methods, to allow
i/o of different data formats through the same interface.
(The goal is to be able to use $dfh->output and $dfh->input
for any data format of interest.)
So these "funny mutators" internally do requires of
another module, an input or output class which is
dynamically defined.
It's possible to have different IO handles in use
concurrently that use different data formats, and it's
also possible to change the format a given IO handle will
use on the fly.
I expect that it will be a common task to read in a
"doomfiles" chain from the published Html, and also read
in a small chain of new material in the Rawtext format,
then to merge the two of them and output revised Html
again (over-writing some of the original source Html).
=over
=cut
=item output_format
=cut
sub output_format {
my ($self, $format) = @_;
my $subname = ( caller(0) )[3];
if ($format) {
my $parent = ref($self);
# # An alternate method: (TODO - is this identical? Subclass test.)
# my $parent = $subname;
# $parent =~ s/::[^:]*$//;
my $output_class = $parent . '::' . $format;
eval "require $output_class";
if ($@) {
croak "$subname: require of $output_class failed: $@";
}
my $output_handle = $output_class->new( $self );
$self->{_output_handle} = $output_handle;
}
return( $self->{_output_handle} );
}
=item input_format
=cut
sub input_format {
my ($self, $format) = @_;
my $subname = ( caller(0) )[3];
if ($format) {
my $parent = ref($self);
# # An alternate method: (TODO - is this identical? Subclass test.)
# my $parent = $subname;
# $parent =~ s/::[^:]*$//;
my $input_class = $parent . '::' . $format; ### TODO - maybe a better way?
($DEBUG) && print STDERR "requiring $input_class\n";
eval "require $input_class";
if ($@) {
croak "$subname: require of $input_class failed: $@";
}
my $input_handle = $input_class->new( $self );
$self->{_input_handle} = $input_handle;
}
return( $self->{_input_handle} );
}
=back
=head2 REAL METHODS (well... real frontends)
These do the real work, though essentially these are just
fronts for the IO::<format> objects...
=over
=cut
=item B<output> -
output must be given a $chain object to work on,
the optional second argument is the node name
Three ordered arguments:
(1) a chain object
(2) the first node to be output.
If undefined start at the begining and continue to the end
(the third parameter is not used in that case).
(3) (a) the final node in the sequence to output or
(b) the total number of nodes to output or
(c) an undefined value: go all the way to the end of the chain.
(see input location docs for more details, it's similar).
(TODO - write general description of both? Make this one more detailed?)
=cut
sub output {
my ($self, $chain, $begin_node, $termination) = @_;
my $subname = ( caller(0) )[3];
$self->{_output_handle}->output($chain, $begin_node, $termination);
}
=item B<output_splice> -
output_splice must be given a $chain object to be output,
and also a second argument, the position in the browse sequence:
the name of an already existing node in the output location
where this chain of nodes will be "spliced" into place.
(As usual, we use the name of the node *before* the position
we want to indicate.)
Note that unlike L<output>, it is presumed that the entire chain
object will be output, so there is no need for the "begin node"
and "termination" argumenst.
Two ordered arguments:
(1) a chain object
(2) the position in the browse sequence
=cut
sub output_splice {
my ($self, $chain, $position) = @_;
my $subname = ( caller(0) )[3];
$self->{_output_handle}->output_splice($chain, $position);
}
=item B<input> - input takes the name of something to get
input from (e.g a file name), and passes it on to the
input method associated with the internal input handle
for the desired data format. The details may differ
depending on the underlying format that's being input.
The _input_location field for this object might define
the directory to look for the item named (or it might be
some database connection information, or the input
routine might ignore it entirely). In the case of
Rawtext input, the entire chain read in might be present
in the single named file, but in the case of Html input,
the chain would be distributed one node per file.
TODO -- this needs to be re-written (best to centralize it?)
The optional second argument is a limit condition, the
number of nodes to be read (to cover the case where you
don't want to read in a big chunk of the entire
collection of nodes). Once again though, the underlying
input routine is not required to do anything with that
parameter.
=cut
sub input {
my ($self, $name, $howmany) = @_;
my $subname = ( caller(0) )[3];
$self->{_input_handle}->input($name, $howmany);
}
=back
=head2 probing the browse sequence
The following two routines request either the input or the
output handle to report on the currently defined
"browse sequence". They return a reference to an array
of names.
=over
=item B<get_browse_sequence_from_input> - returns a reference to the
browse sequence of all nodes in the input location.
=cut
sub get_browse_sequence_from_input {
my ($self) = @_;
my $subname = ( caller(0) )[3];
$self->{_input_handle}->get_browse_sequence_from_input();
}
=item B<get_browse_sequence_from_output> - returns a reference to the browse sequence
of all currently existing nodes in the output location.
=cut
sub get_browse_sequence_from_output {
my ($self) = @_;
my $subname = ( caller(0) )[3];
$self->{_output_handle}->get_browse_sequence_from_output();
}
=back
=head2 probing a node's meta_info
The following two routines look-up a node by name
(from either the input or the output location) returning
three items, the name of the node, the next node link,
and the prev node link. Returning the node
name when node name is also the argument might seem
redundant, but (a) it can be treated as a flag to
determine if the input was sucessful (prev or next would
be undef for the first and last nodes, respectively) and
also (b) where possible the format-specific code will
read the node name in a different way than the lookup is
performed -- e.g. in the case of Html, the returned name is
that of the page title, but the given name is the file name.
=over
=item B<input_meta_info_from_input> - given a node name, gets meta
information for a node located in the input location.
Example usage:
($node_name, $next_name, $prev_name) = $io_handle->input_meta_info_from_input($node_name)
=cut
sub input_meta_info_from_input {
my ($self, $name) = @_;
my $subname = ( caller(0) )[3];
$self->{_input_handle}->input_meta_info_from_input($name);
}
=item B<input_meta_info_from_output> - given a node name, gets meta
information for a node located in the output location.
Example usage:
($node_name, $next_name, $prev_name) = $io_handle->output_meta_info_from_output($node_name)
=cut
sub input_meta_info_from_output {
my ($self, $name) = @_;
my $subname = ( caller(0) )[3];
$self->{_output_handle}->input_meta_info_from_output($name);
}
=back
=head2 Utilities to list available formats
Methods that check what data format modules are available to
be dynamically loaded by input_format or output_format.
### TODO
### Would it be better to search all locations in @INC?
### Currently I just assume they're located just below this level:
### $this_file_location/IO/<format>.pm
### TODO
### Something like FindBin for a module location?
### (I wrote a module_name that uses caller.
### Could it be that FindBin does this already?)
=over
=item B<list_data_formats> - lists all Web::Chain::IO::* modules
(except for Web::Chain::IO::Common, which isn't a format).
Note that these are just the format names, not the full
module name: most likely you will prepend 'Web::Chain::IO::'
to get the module name.
=cut
sub list_data_formats {
my ($self, $name) = @_;
my $subname = ( caller(0) )[3];
($DEBUG) && print STDERR "module path: " . module_path() . "\n";
my $module_path = module_path();
my $format_loc = $module_path . '/IO';
unless (-d $format_loc) { croak "Oddly enough $format_loc doesn't exist."; }
chdir($format_loc);
my @formats = grep {!/Common/} map{ s/\.pm$//; $_ } <*.pm>;
return @formats;
}
=back
=head2 Experimental routines that do some format independant
processing useful to the format dependant code.
=over
=item B<find_first_node_from_input> - get_browse_sequence may be asked
to work with an existing df project without the standard
begin and end nodes (TOP and FIN). If TOP doesn't exist
in that case, there's a need to be able to find the actual
beginning node in the sequence, starting from anywhere
(a random node name, here called the "seed"). If the
given $seed value does not exist, this will return undef.
Example usage:
($seed = $file[0]) =~ s/\.html$//;
$begin_node_name = $self->find_first_node($seed, $location);
=cut
sub find_first_node_from_input {
my ($self, $seed, $location) = @_;
my $subname = ( caller(0) )[3];
my ($node_name, $name_h1, $prev, $next);
$prev = $seed; # initiliaze
do {
$node_name = $prev;
($name_h1, $prev, $next) = $self->input_meta_info_from_input( $node_name );
unless ( $name_h1 ) { # if name is not defined, we've probably been given a bad seed,
return undef; # so we return undef to indicate there's something wrong.
}
} until (not( $prev ));
return $node_name;
}
=item B<find_first_node_from_output> - get_browse_sequence may be asked
to work with an existing df project without the standard
begin and end nodes (TOP and FIN). If TOP doesn't exist
in that case, there's a need to be able to find the actual
beginning node in the sequence, starting from anywhere
(a random node name, here called the "seed"). If the
given $seed value does not exist, this will return undef.
Example usage:
($seed = $file[0]) =~ s/\.html$//;
$begin_node_name = $self->find_first_node($seed);
=cut
sub find_first_node_from_output {
my ($self, $seed, $location) = @_;
my $subname = ( caller(0) )[3];
my ($node_name, $name_h1, $prev, $next);
$prev = $seed; # initiliaze
do {
$node_name = $prev;
($name_h1, $prev, $next) = $self->input_meta_info_from_output( $node_name );
unless ( $name_h1 ) { # if name is not defined, we've probably been given a bad seed,
return undef; # so we return undef to indicate there's something wrong.
}
} until (not( $prev ));
return $node_name;
}
=item B<log_new_names> - given a reference to an array of node
names, include a list of them in a special node reserved
for the purpose of logging these additions. The default name for
that node is defined in Web::Definitions as $DF_WHATSNEW_NODE_NAME,
it can be overridden with the optional second argument.
Example usage:
$dfh->log_new_names( \@new_node_names, $log_node );
=cut
sub log_new_names {
my ($self, $new_node_list_ref, $log_node) = @_;
my $subname = ( caller(0) )[3];
$self->{_output_handle}->log_new_names($new_node_list_ref, $log_node);
}
=item B<generate_contents_node> - re-generate the "contents"
node (i.e. the table of contents), a listing of all nodes
found in the output location.
The default name for that node is defined in
Web::Definitions as $DF_CONTENTS_NODE_NAME, it can be
overridden with the optional argument. This method
reads in the existing contents file first, updates it, and
writes it out again: this ensures that next and prev links
remain intact. The main body of the node is replaced by
a new list describing the current browse sequence.
Example usage:
$dfh->generate_contents_node( $log_node );
=cut
sub generate_contents_node {
my ($self, $contents_node) = @_;
my $subname = ( caller(0) )[3];
$self->{_output_handle}->generate_contents_node($contents_node);
}
1;
__END__
=back
=head1 TODO
Add something to SYNOPSIS about this:
($node_name, $next_name, $prev_name) = $io_handle->
input_meta_info($node_name, $location)
($node_name, $next_name, $prev_name) = $io_handle->
input_meta_info_from_input($node_name)
($node_name, $next_name, $prev_name) = $io_handle->
input_meta_info_from_output($node_name)
=head1 SEE ALSO
=over
=item L<Project Documentation|Web::Project>
=item L<Web::Chain>
=item L<Web::Chain::IO::Rawtext>
=item L<Web::Chain::IO::Html>
=back
=head1 AUTHOR
Joseph Brenner, E<lt>doom@kzsu.stanford.eduE<gt>
=head1 COPYRIGHT AND LICENSE
Copyright (C) 2004 by Joseph Brenner
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.2 or,
at your option, any later version of Perl 5 you may have available.
=head1 BUGS
None reported... yet.
=cut
Joseph Brenner,
Sat Nov 6 17:04:11 2004