Web::Chain project:    Web/Definitions.pm


     package Web::Definitions;
#                                doom@kzsu.stanford.edu
#                                August 03, 2004  # Rev: October 5, 2004

=head1 NAME

Web::Definitions - defines constants (now in the form of exported 
                   variables) in use throughout the doomfiles tools

=head1 SYNOPSIS

   use Web::Definitions qw($DF_DESTINATION_RULE);

   unless( $name =~ qr/$DF_DESTINATION_RULE/ ) {
     die "$name is not a well-formed DF node name";
   }

=head1 DESCRIPTION

Set of constant definitions for doomfiles work.

=cut 

use 5.006;
use strict; 
use warnings;
use Carp;

require Exporter;

our @ISA = qw(Exporter);


=head2 EXPORT

This module uses a system that automatically adds 
candidates for export to the EXPORT_OK list.  
This system exports:

  1. all constants
  2. all *_RULE (or *_rule) variables (declared with 'our')
  3. all UPPERCASE variables (declared with 'our')

The :all tag can be used to import all exports from this file:

  use Web::Definitions qw(:all);

though this is not recommended.

It's better to just import the ones you need, chosen from the following:

=over

=cut

our @export_list;
sub BEGIN {
  my $filename = (caller)[1];
  open ME, "<$filename" or die "Can't open $filename for input: $!";
  while(<ME>){
    # get constant names from "use constant" lines
    if ( m{^ \s* use \s+ constant \s+ (.*?) \s }x ) { 
      push @export_list, $1;
    } elsif ( m{^ \s* our \s+ (.*?_RULE) \s }ix ) {      #  *_rule or *_RULE variable names
      push @export_list, $1;
    } elsif ( m{^ \s* our \s* ( \$[A-Z0-9_]+ ) \s* = }x ) {  #  $UPPER_CASE vars  
      push @export_list, $1 unless ($1 eq '$VERSION'); # skip $VERSION
    }
  }
}
# Additional items manually exported:  (currently, none)
push @export_list, qw(  
                      );
our %EXPORT_TAGS = ( 'all' => \@export_list );
our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );

=item * B<$DF_VERSION> - project wide version number.

=cut 

our $DF_VERSION = '0.08';
our $VERSION = $DF_VERSION;

=item * B<$DEBUG> - project wide debug flag, turns on lots 
  of excessively verbose reporting, all sent to STDERR.

=cut

our $DEBUG=0;

=item * B<$DOOM_HOME> - home directory of user doom (sometimes 
  this is better than playing with $ENV{'HOME'} because it 
  doesn't change if the script is run as root). 
  
=cut

our $DOOM_HOME = '/home/doom'; 

#--------
# Web oriented locations and patterns

=item * B<$DF_LOC> - location of DF html in staging area (the finished product)

=cut 

our $DF_LOC = $DOOM_HOME . '/End/Stage/Mirthless/doomfiles'; 

=item * B<$DOOM_THOUGHTS> - default source location of DF nodes in progress (rawtext)
   The alternate name $DF_RAWTEXT is also available.

=cut

our $DOOM_THOUGHTS = $DOOM_HOME . '/End/Thought';
our $DF_RAWTEXT = $DOOM_THOUGHTS;


=item * B<$DF_PUSH_RSYNCH_TARGET> - rsync target for pushing the entire 
   project directory out to the public site

=cut

our $DF_PUSH_RSYNCH_TARGET = 'mirthles@shell.grin.net:/usr/home/mirthles/public_html';
# Example: $cmd = "rsync -avz -e ssh $doom_loc mirthles\@shell.grin.net:/usr/home/mirthles/public_html";

=item * B<$DF_PUSH_SCP_TARGET> -  target for individual file transfer (via scp, or "rsync *.html" 
   rather than "rsync <dir>").

=cut

our $DF_PUSH_SCP_TARGET = "mirthles\@shell.grin.net:/usr/home/mirthles/public_html/doomfiles";
# Example: `scp $doomdir/$fpat.html mirthles\@shell.grin.net:/usr/home/mirthles/public_html/doomfiles`;

=item * B<$DF_THOUGHTS_TEMP_LOC> - bone pile to toss use rawtext files 
   after adding to the df html site.

=cut 

our $DF_THOUGHTS_TEMP_LOC = $DOOM_THOUGHTS . '/Out';

=item * B<$DF_TOPNODE_NAME> - standard name for the first node in the linked-list 
  of DF files. 

=cut

our $DF_TOPNODE_NAME = 'TOP';

=item * B<$DF_BOTNODE_NAME> - standard name for the last node in the linked-list 
  of DF files. 

=cut

our $DF_BOTNODE_NAME = 'FIN';

# some other special nodes (generated and/or linked-to automatically)

=item * B<$DF_CONTENTS_NODE_NAME> - another standard name for a "special node":
  a table of contents of all existing nodes in the DF project (this is an 
  automatically generated node).

=cut

our $DF_CONTENTS_NODE_NAME = 'CONTENTS';

=item * B<$DF_WHATSNEW_NODE_NAME> - another standard name for a "special node":
  this is an inverse chronological change log.  Additions to it are generated 
  automatically, but manual annotations are expected.

=cut

our $DF_WHATSNEW_NODE_NAME = 'WHATSNEW';

=item * B<$DF_WHATSNEW_NOW_MARKER> - a special marker string 
   that should always exist in a comment near the top of 
   WHATSNEW.html. This marker is searched for 
   when automatically adding additions to the file.

=cut

our $DF_WHATSNEW_NOW_MARKER = '===NOW MARKER===';  

=item * B<$DF_NEW_NODES_LOG> - standard file that new additions to the 
  DF project were once logged to (this practice is being 
  phased out). 

=cut

our $DF_NEW_NODES_LOG = $DOOM_HOME . '/tmp/doomfile_nodes.log';

#--------
# Web processing oriented patterns
#

=item * B<$df_node_name_quantified_pat> - this variable is 
  essentially the central definition of what constitutes a valid 
  "doomfiles node" name.  These names must be at least three 
  characters long, and typically will be in all UPPER_CASE with 
  underscores as separators though hyphens and numerics are allowed, 
  as well as the lower-case 'c' (which allows names like McCLAUREN).
  This variable is not exported as is, but is used by a number of 
  regexp rules below which are.
  Example allowed node name:
       McCELTIC-AMERICAN_SOUL_7

=cut

our $df_node_name_char_class_pat = '[0-9cA-Z_-]'; 
our $df_node_name_quantified_pat = $df_node_name_char_class_pat . '{3,}'; 

=item * B<$DF_NODE_NAME_RULE> - doomfiles node name pattern, 
  without any pinning at the beginning or end (with '^' or '$').
  This is the regexp rule equivalent of $df_node_name_quantified_pat.

=cut 

our $DF_NODE_NAME_RULE = qr{ $df_node_name_quantified_pat }x;


=item * B<$DF_NODE_NAME_PINNED_RULE> - doomfiles node name pattern, 
  pinned at both ends (used for verifying that a string contains 
  a valid node name and nothing else). 

=cut 

our $DF_NODE_NAME_PINNED_RULE = qr{ ^ $df_node_name_quantified_pat $ }x; # Note, begin and end pinning


=item * B<$DF_DESTINATION_RULE> - doomfiles node name pattern, pinned at the 
beginning of the string via '^'.  Used to extract a node name from 
a body of text when it is up against the left margin, the right side 
is pinned with zero-width lookahead for \s and/or $, the eol.
This right side pinning is better than simple greedy matching, because 
it avoids a minor problem with false positives.  In the case of 
"FALSE*NAMES", it should not capture "FALSE", but instead report that 
it doesn't see a valid match there.

=cut

#our $doomfiles_node_name_pat = "^[0-9cA-Z_-]{3,}"; # Note: half-pinned with '^'. Do not pin this with '$'
#our $doomfiles_node_name_pat = "^[0-9cA-Z_-]{3,}(?=\\s|\$)"; # Note: half-pinned with '^'. Do not pin this with '$'
#our $DF_DESTINATION_RULE = qr{ $doomfiles_node_name_pat }x;

our $DF_DESTINATION_RULE = qr{ ^                                # Labels are up against left margin
                             (                                # Capturing to $1
                                $df_node_name_quantified_pat  # i.e. [0-9cA-Z_-]{3,}           
                             )
                             (?= \s | $ )                     # space or EOL to pin the pattern
                            }x;

=item * B<$DF_GENERAL_NAME_RULE> - A simple, very liberal rule, matches 
   both links and destination labels

=cut

our $DF_GENERAL_NAME_RULE = qr{ 
                                 \b
                                 ( $df_node_name_quantified_pat )   # doomfile node name, captured to 1
                                 \b
                              }smx;

=item * B<$doomfiles_thoughts_node_separator_pat> - detects a line that 
consists of a bar equal signs beginning in the first column, 
('===').  This is used in the "rawtext" doomfiles format to indicate 
the end of a node.  Used by the $DF_END_RULE below.

=cut 

our $doomfiles_thoughts_node_separator_pat = '^==+\s*$';

#--------
# Web::Chain::IO::Output::Html
# (via txt2html in Web::Pro::HtmlOutput)

# (((TODO Why is the above label here?  The following has nothing 
#  to do with it... straighten out these comments.))


#--------
# Web::Chain::IO::Input::Rawtext
#

=item * B<$DF_THOUGHTS_LINK_RULE> - Tries to identify links embedded in 
  the rawtext source files (sometimes called "Thoughts") without 
  getting confused by incidental use of uppercase strings in the 
  text.  Doomfile-style links are distinguished by the whitespace 
  that surrounds them, roughly, at least two spaces before and 
  behind, where the end of the line (and possibly the beginning?)
  can be thought of as a chunk of virtual spaces.  
  (Getting this to work right on all corner cases is a 
  a suprisingly difficult problem.)
  This version is a quickie that has bugs in identifying a link 
  near the end of the line without trailing spaces.  
  It keeps things simple by capturing leading and trailing spaces 
  as well as the link, using it in a s/// requires building a 
  replacement version with $1 $2 $3.

=cut 


our $doomfiles_thoughts_link_pat = '([\ ]{2,})(' . $df_node_name_quantified_pat . ')([\ ]{2,}|$)'; 

# our $DF_THOUGHTS_LINK_RULE = qr{ $doomfiles_thoughts_link_pat }x;  # $1 and $3: whitespace, $2: node_name
# TODO BUG The above, which is in "production" 
# doesn't work with the eol case at all (that's TO_HELL
# in the test case in the *.t).
# Does this fix it?

our $DF_THOUGHTS_LINK_RULE =   # $1 and $3: whitespace, $2: node_name
qr{
  ( [\x20\t][\x20\t] )                 # two leading spaces, captured to $1
                                       #    Note: requires "fixed string" (char class okay)
  ( $df_node_name_quantified_pat )     # doomfile node name, captured to $2
  (  [\x20\t][\x20\t]                  #   2 spaces or...            \
   | [\x20\t] $                        #   1 space then eol or...     > captured to $3
   | $                                 #   eol                       /
   )
 }msx;


=item * B<$DF_EMBEDDED_LINK_SINGLE_CAPTURE_RULE> - Tries to identify 
  links embedded in the rawtext source files (sometimes called 
  without getting confused by incidental use of uppercase 
  strings in the text.  
  Doomfile-style links are distinguished by the whitespace 
  that surrounds them, roughly at least two spaces before and 
  behind, where the end of the line (and possibly the beginning?)
  can be thought of as a chunk of virtual spaces. 
  (Getting this to work right on all corner cases is a 
  a suprisingly difficult problem: I'm willing to compromise 
  on the left hand boundary, and say "no links allowed 
  without at least two spaces from the left margin", 
  but getting the right side to work right could be a problem.)
  This is much like $DF_THOUGHTS_LINK_RULE, except that 
  this version is an attempt at capturing *only* the link itself
  to $1, using zero-width patterns to identify the whitespace.
  Also, this pattern is intended to more cases. 

=cut 

our $DF_EMBEDDED_LINK_SINGLE_CAPTURE_RULE = qr{ 
  (?<= [\x20\t][\x20\t] )              # zero-width pos lookbehind for two spaces
                                       #    Note: requires "fixed string" (char class okay)
  ( $df_node_name_quantified_pat )     # doomfile node name, captured to 1
  (?=                                  # zero-width positive lookahead for...
     [\x20\t][\x20\t]                  # 2 spaces or...
   | [\x20\t] $                        # 1 space then eol or... 
   | $                                 # eol
   )
 }smx;

our $DF_EMBEDDED_LINK_RULE = $DF_EMBEDDED_LINK_SINGLE_CAPTURE_RULE;


our $DF_NODE_RULE = qr{ $df_node_name_quantified_pat }x;  # Note no '^' pinning   # on it's way to Deprecated

# Note in the following $DF_START_RULE 
# the first non-doomfiles_node_name_pat character terminates 
# the name, greedy matching ensures the whole name will be captured. 
# In theory, anything could follow the name (a comment?) and it would be ignored (unused feature).

# our $DF_START_RULE = qr{ ^ # link destinations have labels at start of line
#                          ($DF_NODE_RULE)  # capture name to $1
#                         }x;
our $DF_START_RULE = $DF_DESTINATION_RULE;


our $DF_END_RULE = qr{ $doomfiles_thoughts_node_separator_pat }x;  

#--------
# Web::Chain::IO::Input::Html
# Html Format Web crunching 
#

=item * B<$DF_EXTRACT_NEXT_NODE_RULE>, B<$DF_EXTRACT_PREV_NODE_RULE>, B<$DF_EXTRACT_BODY_RULE> - 
  These rules are used to scrape the "next node" and "previous node" and the main body 
  of content  out of the finished DF html files.

=cut 

our $df_extract_next_node_pat = '">\[NEXT\ -\ (' . $df_node_name_quantified_pat . ')\]</A>';
our $DF_EXTRACT_NEXT_NODE_RULE = qr{ $df_extract_next_node_pat }x;  

our $df_extract_prev_node_pat = '">\[PREV\ -\ (' . $df_node_name_quantified_pat . ')\]</A>';
our $DF_EXTRACT_PREV_NODE_RULE = qr{ $df_extract_prev_node_pat }x; 

our $DF_EXTRACT_BODY_RULE = qr{ </H1> \s* <PRE> \s* $                     # Starts with <PRE> block after title (</H1>)
                                (.*?)                                     # Capture all text up to 
                                ^--+ \s*                                  # The line of hyphens before... 
                                (?:
                                    <A \s+ HREF [^[]+ \[NEXT \s+ -        # the NEXT link or ... 
                                  | </PRE>                                # the </PRE> link (FIN.html has no NEXT)  
                                 )
                              }msx;

# An example of the field of html that the above rule works on:
#
#    <HTML><HEAD>
#    <TITLE>The doomfiles - DREAMS.html</TITLE>
#    </HEAD><BODY>
#    <PRE>                                <A HREF="MAGIC.html">[PREV - MAGIC]</A>    <A HREF="TOP.html">[TOP]</A></PRE>
#    <H1>DREAMS</H1>
#
#
#    <PRE>     
#
#         I don't remember looking in   
#         their direction either.       
#        
#        
#    --------                      
#                                
#    <A HREF="COUCH.html">[NEXT - COUCH]</A>
#    </PRE></BODY></HTML>


#--------
# Patterns to extract <TITLE> and <H1> strings,

=item * B<$DF_EXTRACT_TITLE_RULE>, B<$DF_EXTRACT_H1_RULE> - 
 These are patterns to extract <TITLE> and <H1> strings,
 out of finished DF html files.  

=cut 

our $DF_EXTRACT_TITLE_RULE = qr{<TITLE.*?>(.*?)</TITLE>}i;
our $DF_EXTRACT_H1_RULE = qr{<H1.*?>(.*?)</H1>}i;

#--------
# A splicing technique
# 
# Add these patterns too?
# # Change PREV unless at TOP
#     $text =~ 
# 	s{<A HREF="([^\.]*)\.html">\[PREV - \1\]</A>}
#          {<A HREF="$prevnode.html">[PREV - $prevnode]</A>}
#        unless ($file eq $top);
# # Change NEXT unless at BOT
#     $text =~ 
# 	s{<A HREF="([^\.]*)\.html">\[NEXT - \1\]</A>}
#          {<A HREF="$nextnode.html">[NEXT - $nextnode]</A>}
#        unless ($file eq $bot);


#--------
# More Rawtext processing 
# 

=item * B<$DF_THOUGHTS_NODE_HEADER_RULE> - 
  Using this in a qr//msg should extract all the new doomfiles
  nodes in a Rawtext (aka 'Thoughts') file. (Currently not 
  used).

=cut


our $DF_THOUGHTS_NODE_HEADER_RULE =
   $doomfiles_thoughts_node_separator_pat . 
   '(^\s*$)' .
   '(' . $df_node_name_quantified_pat . ')';


1;

__END__

=back

=head1 DISCUSSION

=head2 STYLE

(1) Avoid using creating a lot of built-up definitions to be 
exported.  E.g. something like this isn't a good idea:

    our $DF_TOPNODE = $DF_LOC . '/' . $DF_TOPNODE_NAME . '.html'; 

Because code that such a $DF_TOPNODE can't be tested 
very well.  It always accesses the live location $DF_LOC.
Better to do this sort of build-up in the code that 
uses these definitions.

(2) Use names like "*_pat" for strings that contain regular
expressions, and "*_RULE" for actual regexp objects created
with qr{}.  Export all "*_RULE"s automatically.

(3) Actually, you should note that this module automatically
exports all variables with names all in uppercase.
This lessens the chance of collision with other variables 
and makes the exports visually resemble constants even though 
they're not.

=head2 CONSTANT IRRITATION

Perl constants have little going for them: their only
real advantage is the compiler can optimize them to inlines.

Well, there's also the fact that they're I<constant> and
putting a value into that kind of straight-jacket might help
prevent a shoot-yourself-in-the-foot problem, but unlike
most such things in perl, there is no easy way to escape the
straight-jacket later if you really need to.  There is no:
{no strict 'constants' ... }

It's an odd thought, but it isn't really unusual to want to
temporarily change a "constant" e.g. over-ride a
project-level $DEBUG flag, setting it temporarily just for
the current module.

Also on the negative side, constants don't really interpolate, 
not even if you try fugly tricks like:

   print "My constant: &CONSTANT() \n";

Though, if you want to get even fuglier, you I<can> do this:

   print "My constant: @{[ CONSTANT ]} \n";

Why use perl at all if you're going to, uh, constantly 
do things like this:

   print "My constant: " . CONSTANT . "\n";

So: I've stopped using (and exporting) constants.  

=head1 SEE ALSO

L<Project Documentation|Web::Project>

=head1 AUTHOR

Joseph Brenner, E<lt>doom@kzsu.stanford.eduE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2004 by Joseph Brenner

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.2 or,
at your option, any later version of Perl 5 you may have available.

=head1 BUGS

None reported... yet.

=cut


     

Joseph Brenner, Sat Nov 6 17:04:11 2004