<<

NAME

Web::Chain::IO - provides an input/output handle for chains of nodes with dynamically defined input and output formats.

SYNOPSIS

   use Web::Chain::IO;

   my $dfh = Web::Chain::IO->new; # Create an io handle
   $dfh->input_location($input_location);
   $dfh->input_format('Rawtext');   
   my $file_name_txt = "/home/doom/Thought/MEANDERINGS";
   my $chain = $dfh->input($file_name_rawtext);

   use Web::Chain;
   my $chain = Web::Chain->new;  

   $dfh->output_location($location_html);
   $dfh->output_format('Html');  
   $dfh->output($chain);

   $dfh->input_location($location_html); 
   $dfh->input_format('Html');
   $chain = $dfh->input($begining_node_name, $end_node_name);  

   # Get a listing of the existing DF nodes already in html webpage form:
   my $dfh = Web::Chain::IO->new; 
   $dfh->input_location($input_location);  
   $dfh->input_format('Html');   
   my $node_names_ref = $dfh->get_browse_sequence_from_input;
   foreach ( @{ $node_names_ref } ) { print "$_\n"; }

   @data_formats = $dfh->list_data_formats();     

DESCRIPTION

Web::Chain::IO is a wrapper object that let's you do file input and output of different formats on a Web::Chain structure.

Internally this module uses a pair of input and output handles the types of which are determined at run time.

Different file formats are handled by handles derived from different input or output classes.

The IO object (called "$dfh" in the examples in the SYNOPSIS) largely serves just to pass on requests to the appropriate format-specific handler.

METHODS

new - creates a new chain IO object for the actual input or output

SIMPLE MUTATORS

These set and get the "location" values that may or may not get used by the different possible input and output routines for the different data formats. It's up to them, if and how.

input_location

output_location

FUNNY MUTATORS

The "funny mutators" here do something a little unusual: they manage the input and/or output handles that are stored inside the IO objects (such as $dfh, in the SYNOPSIS).

This is a way of using aggregation to get polymorphic behavior for the "input" and "output" methods, to allow i/o of different data formats through the same interface.

(The goal is to be able to use $dfh->output and $dfh->input for any data format of interest.)

So these "funny mutators" internally do requires of another module, an input or output class which is dynamically defined.

It's possible to have different IO handles in use concurrently that use different data formats, and it's also possible to change the format a given IO handle will use on the fly.

It is a common task to read in a chain of nodes from the published Html, and also to read in a small chain of new material in the Rawtext format, then to merge the two of them and output revised Html again (over-writing some of the original source Html).

output_format

input_format

GENERAL METHODS

These are essentially just front-ends for the IO::<format> objects, which do the real work.

output - output must be given a $chain object to work on, the optional second argument is the node name Three ordered arguments: (1) a chain object (2) the first node to be output. If undefined start at the begining and continue to the end (the third parameter is not used in that case). (3) (a) the final node in the sequence to output or (b) the total number of nodes to output or (c) an undefined value: go all the way to the end of the chain. (see input location docs for more details, it's similar). (TODO - write general description of both? Make this one more detailed?)

output_splice - output_splice must be given a $chain object to be output, and also a second argument, the position in the browse sequence: the name of an already existing node in the output location where this chain of nodes will be "spliced" into place. (As usual, we use the name of the node *before* the position we want to indicate.) Note that unlike output, it is presumed that the entire chain object will be output, so there is no need for the "begin node" and "termination" argumenst. Two ordered arguments: (1) a chain object (2) the position in the browse sequence

input - input takes the name of something to get input from (e.g a file name), and passes it on to the input method associated with the internal input handle for the desired data format. The details may differ depending on the underlying format that's being input. The _input_location field for this object might define the directory to look for the item named (or it might be some database connection information, or the input routine might ignore it entirely). In the case of Rawtext input, the entire chain read in might be present in the single named file, but in the case of Html input, the chain would be distributed one node per file. TODO -- this needs to be re-written (best to centralize it?) The optional second argument is a limit condition, the number of nodes to be read (to cover the case where you don't want to read in a big chunk of the entire collection of nodes). Once again though, the underlying input routine is not required to do anything with that parameter.

probing the browse sequence

The following two routines request either the input or the output handle to report on the currently defined "browse sequence". They return a reference to an array of names.

get_browse_sequence_from_input - returns a reference to the browse sequence of all nodes in the input location.

get_browse_sequence_from_output - returns a reference to the browse sequence of all currently existing nodes in the output location.

probing a node's meta_info

The following two routines look-up a node by name (from either the input or the output location) returning three items, the name of the node, the next node link, and the prev node link. Returning the node name when node name is also the argument might seem redundant, but (a) it can be treated as a flag to determine if the input was sucessful (prev or next would be undef for the first and last nodes, respectively) and also (b) where possible the format-specific code will read the node name in a different way than the lookup is performed -- e.g. in the case of Html, the returned name is that of the page title, but the given name is the file name.

input_meta_info_from_input - given a node name, gets meta information for a node located in the input location. Example usage: ($node_name, $next_name, $prev_name) = $io_handle->input_meta_info_from_input($node_name)

input_meta_info_from_output - given a node name, gets meta information for a node located in the output location. Example usage: ($node_name, $next_name, $prev_name) = $io_handle->output_meta_info_from_output($node_name)

Utilities to list available formats

Methods that check what data format modules are available to be dynamically loaded by input_format or output_format.

### TODO ### Would it be better to search all locations in @INC? ### Currently I just assume they're located just below this level: ### $this_file_location/IO/<format>.pm

list_data_formats - lists all Web::Chain::IO::* modules (except for Web::Chain::IO::Common, which isn't a format). Note that these are just the format names, not the full module name: most likely you will prepend 'Web::Chain::IO::' to get the module name.

Experimental routines that do some format independant processing useful to the format dependant code.

find_first_node_from_input - get_browse_sequence may be asked to work with an existing df project without the standard begin and end nodes (TOP and FIN). If TOP doesn't exist in that case, there's a need to be able to find the actual beginning node in the sequence, starting from anywhere (a random node name, here called the "seed"). If the given $seed value does not exist, this will return undef. Example usage: ($seed = $file[0]) =~ s/\.html$//; $begin_node_name = $self->find_first_node($seed, $location);

find_first_node_from_output - get_browse_sequence may be asked to work with an existing df project without the standard begin and end nodes (TOP and FIN). If TOP doesn't exist in that case, there's a need to be able to find the actual beginning node in the sequence, starting from anywhere (a random node name, here called the "seed"). If the given $seed value does not exist, this will return undef. Example usage: ($seed = $file[0]) =~ s/\.html$//; $begin_node_name = $self->find_first_node($seed);

log_new_names - given a reference to an array of node names, include a list of them in a special node reserved for the purpose of logging these additions. The default name for that node is defined in Web::Definitions as $DF_WHATSNEW_NODE_NAME, it can be overridden with the optional second argument. Example usage: $dfh->log_new_names( \@new_node_names, $log_node );

generate_contents_node - re-generate the "contents" node (i.e. the table of contents), a listing of all nodes found in the output location. The default name for that node is defined in Web::Definitions as $DF_CONTENTS_NODE_NAME, it can be overridden with the optional argument. This method reads in the existing contents file first, updates it, and writes it out again: this ensures that next and prev links remain intact. The main body of the node is replaced by a new list describing the current browse sequence. Example usage: $dfh->generate_contents_node( $log_node );

TODO

Add something to SYNOPSIS about this:

($node_name, $next_name, $prev_name) = $io_handle-> input_meta_info($node_name, $location)

($node_name, $next_name, $prev_name) = $io_handle-> input_meta_info_from_input($node_name)

($node_name, $next_name, $prev_name) = $io_handle-> input_meta_info_from_output($node_name)

SEE ALSO

Project Documentation

Web::Chain

Web::Chain::IO::Rawtext

Web::Chain::IO::Html

AUTHOR

Joseph Brenner, <doom@kzsu.stanford.edu>

COPYRIGHT AND LICENSE

Copyright (C) 2004 by Joseph Brenner

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.

BUGS

None reported... yet.

<<