Web::Chain project: Web/Project.pod
=pod
=head1 NAME
Project.pod -- Overall documentation for the Web::Chain project.
Version 0.8
=head1 INTRODUCTION
The Web::Chain project is a set of OO modules to implement
a set of utilities to manage a particular type of hypertext,
where a series of pages (or "nodes") are connected in a
fixed linear sequence (really, a bi-directional linked-list).
This is sometimes called a "browse sequence": it gives
readers the option of reading straight through the content
as though it were a book, one page after another.
=head1 DESIGN
=head2 Classes
=over
=item B<Web::Node> - roughly equivalent to a "web page", with pointers
the the next and previous nodes in a sequence or "chain".
=item B<Web::Chain> - A sequence of nodes, largely just pointers to
the beginning and end nodes.
=item B<Web::Chain::IO> - The IO object is a handle for input and output of
information into a Chain from different data formats.
This is an "interface" only, which does no implementation.
Methods exist to choose data formats, and when that is
done, a module to handle those formats is dynamically
loaded. The IO class itself contains very little
besides wrapper functions that pass requests to a
data format module, and return the results.
=item B<Web::Chain::IO::Rawtext> - An example of a dynamically
loaded data format. A single Rawtext file may contain
many nodes, which are just blocks of text separated
by a bar of equal signs. Each node begins with a node name against
the left margin (in the first column), and any node
may contain links to such node names by
embedding them in the text with a certain
amount of white space around them. (Note that
node names must fit a particular pattern,
typically they look like: NODE_NAME_1).
=item B<Web::Chain::IO::Html> - Another example of a data format.
A representation of a chain of nodes as a series of
html pages with html jumps labeled NEXT and PREVIOUS
connecting the pages.
=item B<Web::Chain::IO::Common> - A base class that dynamically
loaded data format classes inherit from. This
is a compromise on the main design concept, this
is a convenient place to put shared code
(largely argument checking methods, which suggests
that it might more properly be thought of as part
of the "interface": future versions may blend this
module into the IO class).
=item B<Web::Chain::Pro> - Another compromise: this is a module
full of straight functions (non-OOP code) that
were largely written to work with the older
procedural code base.
=back
=head1 THEORY
The usual form of inheritance -- these days typically called
"implementation inheritance" -- has some well known problems, to
the point where the Gang of Four can just recite the slogan "inheritance
breaks encapsulation" and expect everyone to know what they're
talking about.
My take on this problem is that with a typical
inheritance-based design, you end up with a tree
where every branch becomes a single, massive
logical unit. In order to make any use of a subclass at the
bottom of the chain of "is a" relationships, you typically
need to understand all of the classes above it. You end
up with "modules" without modularity, at least from the point
of view of someone trying to use the code later. Inheritance
can often make it easier to create variant versions of
existing code in the first place, but that's a dangerous
asymmetry: easy to write, but hard to understand later.
The general prescriptions to get around this problem are:
(a) use aggregation in preference to inheritance (My
tentative rule-of-thumb: reserve implementation inheritance
for quick fixes of early design errors).
(b) use "interface inheritance" rather than "implementation
inheritance".
=head2 Interface Inheritance
With "interface inheritance" an abstract base class is
used to specify the behavior of a type of object, and different possible
implementations of that behavior inherit this spec from the abstract
class. This is intended to make it possible to write new forms of the
object that existing code can use without modification.
It's often said that perl "does not support interface inheritance", but
that of course, just sounds like a challenge to a typical programmer.
One of the purposes of this project was to implement a variety
of interface inheritance in perl, and see how well it could
be made to work.
=head2 Aggregation
In a typical OOP design, each object has an associated
type, and along with it you get a set of methods designed
to work on this type. While a single choice (e.g. the data
format you intend to work on) can be easily encoded as a
single object type, it's less clear what the right way is to
achieve polymorphism when there are two (or more) independent
choices that need to be made (e.g. in our case, the input and
the output data formats).
One way of doing it might be multiple inheritance, but that's
frowned on even more strongly than implementation inheritance.
It could be done with two layers of inheritance:
output_enabled_object "is a",
input_enabled_object "is a",
storage_and_manipulation_object
But long chains of inheritance can be clumsy (as is
discussed above).
Instead, this design works with "has a" relationships, each
Web::Chain::IO object has an input and output handle, though
the actual code that implements these handles is loaded
dynamically when the choice of data format is made.
The Web::Chain::IO class just specifies the interface.
=head2 Design Patterns
The Node class is a variation of the "fly-weight" pattern:
A given node may be contained by different chain objects,
but it is always the *same* node, not a copy. Note that
since the next and previous linkage is implemented on the
Node level, there are limitations on how different one
Chain can be from another, if both include some of the same
Nodes.
The interface/implementation division here bears a strong
resemblance to the "Strategy" pattern, in that the higher-
level commands may access different implementations in
the lower level code, depending on circumstances.
=head1 DETAILED DESCRIPTION
This project is intended to facilitate processing
linear chain of nodes of information, commonly (though
not necessarily) a series of web pages joined in a sequence
by "next" and "previous" links.
While in principle a hypertext document can be thought of
as a series of nodes (or "pages") connected in an arbitrary
manner; early experimentation with hypertext systems
showed a tendency for users to feel "lost in hyperspace".
They would often wander aimlessly without any sense of where
they are in the overall structure.
A simple thing that can be done to combat this is to
organize a hypertext into a linear sequence (i.e. a
a "browse sequence").
In the ideal case clicking on the "Next" link should take
you to a page with a thematic connection to the page you've
just been reading, but this is not strictly necessary (and
not always possible: webs don't easily convert to linear form).
The authors and editors working on such projects need tools
to manipulate these chains of nodes. Consider what needs
to be done when you add a new page at some point in the
sequence: the "next" and "previous" links in the new page
must point at two existing nodes, each of them will have a link
which will also will need to be updated to point at the new node:
there are four link updates total, in three pages. Moving a
segment from one place to another in the sequence involves six
such link updates, in five pages.
These operations can be done manually, but they are far too tedious
and error prone to want to do very many of them that way.
The goal here is to automate the process in a convenient way.
=head2 DF aka "Doomfiles"
My personal interest in this problem arises from a long-standing
hypertext writing project I'm engaged in, which goes by the
unimpressive name of "the doomfiles" (I've used "doom" as a handle
for a long time -- much longer than the silly game has existed --
e.g. I used to use "The Voice of Doom" as a college radio airname).
This project might be compared to a "personal blog", but it
pre-dates the blog era, in fact, it predates the web era: it
was originally just a large file without read-protection, whose
intended audience was other users of Stanford's unix systems.
Hypertext links were then implemented simply text searches on
upper-case keywords
I still use this "rawtext" format for writing new DF material,
so it's one of the primary hypertext formats (along with HTML)
that the initial version of this project needed to support.
Later (after the web was invented) I converted this text
format to a series of interconnected web-pages.
So "Rawtext" is yet another alternative method of creating html
files by writing in some "simpler" text-like format, ala all
the different "wikis".
In my defense:
=over
=item 1 I invented this format before html existed.
=item 2 I'm not trying to talk anyone else into using my
format. The following is strictly FMI ("for my information"),
and it beats me why *you* would want to read this.
=back
=head2 "Rawtext" format
The rules of the original 'rawtext' source format are
simple: A link is an upper-case term surrounded by
whitespace. The destination it jumps to is the same
upper-case term flush-left (immediately following a
horizontal divider line of equal-signs, or the top of the
raw text file).
A rawtext file then, might look something like the
the following.
__________________________________________________
|TOP |
| |
| Some discussion of |
| what this thing is |
| you're looking at. HISTORY |
| |
| |
|=== |
|FIRST_THOUGHT |
| |
| Musing away already |
| Muse muse muse... But what |
| Music or museums? about? CAVEATS |
| |
|=== |
|CAVEATS |
| |
| Hedging now better |
| than excuses later. |
| |
|=== |
|BIG_TOPIC |
| |
| Geting into |
| something bit Or relatively ASIDE |
| large at any rate. |
| Something |
| important. |
| |
| Another important |
| thought... TANGENTIAL_RAMBLE |
| |
|=== |
|ASIDE |
| |
| Does big really matter? |
| |
|=== |
|HISTORY |
| |
| How this got |
| started. |
| |
|=== |
|NOTES_FOR_THE_FUTURE |
| |
| Flesh this out |
| cut that down. |
| |
|=== |
|TANGENTIAL_RAMBLE |
| |
| Something completely |
| irrelevant, as though CAVEATS |
| that were unusual. |
| |
| |
|=== |
|FIN |
| |
| See you. |
| |
|__________________________________________________|
=head2 "Html" format
For our purposes, "Html" format is more rigidly defined than
the usual "html" page would be. The Html format has a header
with a standard layout including a "PREV" link to the previous
page in the sequence, and a standard footer with a "NEXT"
link to the next page in the sequence. The main body of content
is embedded in <PRE></PRE> tags, so that white space
is significant: this allows the use of graphical layout
to suggest the finer grained "hyperlinks" between each
paragraph of the content. Currently, the Web::Node
module has no understanding of this fine-grained content,
it's just treated as a blob.
=head2 Motivation
My older scripts for processing doomfiles rawtext and
transforming it into html was always a little buggy and
brittle; hence this more careful re-write, trying to apply
some more modern design principles.
If the job is done right, it should facilitate transforming
this project into other data formats in the future.
Possibly the HTML will become XHTML, possibly there's
a reason to switch to a database-backed design, etc.
=head1 FUTURE DEVELOPMENT
Future development plans include:
=over
=item Phase out old procedural routine, phase in templating with Mason.
The conversion of the main body to html, and the html format
of the headers and footers, is currently handled by some
procedural routines, closely based on the original legacy scripts.
Ultimately, I expect to replace with a templating system (most
likely Mason, though Text::Template is a candidate).
=item The internal storage format of the body text is xml (or xml-like):
I would like to add more detail to the information stored like this.
Currently a DF node is handled as a blob of text with significant
whitespace, but to the human reader there is internal detail, a branching
structure of "rectangular paragraphs". A logical next step would
be to mark-up each rectpara with location and apparent connection to
the rest, which might increase the range of different possible
output data formats that could be supported.
=back
=head1 SEE ALSO
=over
=item L<Web::Node>
=item L<Web::Definitions>
=item L<Web::Chain::IO>
=item L<Web::Chain::IO::Html>
=item L<Web::Chain::IO::Rawtext>
=item L<Web::Pro::Interact>
=item L<Web::Pro::Transform>
=item L<Web::Pro::HtmlOutput>
=back
=head1 AUTHOR
Joseph Brenner, E<lt>doom@kzsu.stanford.eduE<gt>
=head1 COPYRIGHT AND LICENSE
Copyright (C) 2004 by Joseph Brenner
This document is part of a free software project; you can
redistribute it and/or modify it under the same terms as
Perl itself, either Perl version 5.8.2 or, at your option,
any later version of Perl 5 you may have available.
=head1 BUGS
None reported... yet.
=cut
Joseph Brenner,
Sat Nov 6 17:04:11 2004