Chapter 3 - A View of DocumentsLast updated: March 26, 1996Documents as CategoriesWhat does it mean to move this document to that format? This is the key question we'll address through the framework of category theory. Category theory is a branch of abstract mathematics dealing with the algebra of functions. It is concerned with objects and the mappings between them, which dovetails nicely with the problem of translating documents.Category Theory: BackgroundCategory theory addresses fundamental structures. It generalizes several familiar mathematical constructs: sets, graphs, groups, and others.
Pierce [] defines a category: 1.1.1 Definition. A category C comprises: Let's look at a few examples. The simplest category is 0. It has no objects and no arrows, so it satisfies conditions 3 through 5 vacuously. The category 1 has only one object. By condition 5, we must have at least an identity arrow for it. Having this one arrow trivially satisfies the other conditions. This category looks like this: __
v \
o |
\_/
The category 2 has two objects and three arrows, and looks like this:
__ __
/ v v \
| o --------->o |
\_/ \_/
We'll see this one in a later example.
Two other important categories are Set and Graph. "The category Set has sets as objects and total functions between sets as arrows." [Pierce, p. 2] "The category Graph has directed multi-graphs as its objects. An arrow f:G->H in Graph is a structure-preserving map between graphs." [Pierce, p. 51] The next important notion is that of a functor. Again, from Pierce [p. 36]: 2.1.1 Definition. Let C and D be categories. A functor F:C->D is a map taking each C-object A to a D-object F(A) and each C-arrow f:A->B to a D-arrow F(f):F(A)->F(B), such that for all C-objects A and composable C-arrows f and g: Just as an arrow in a category maps between objects, functors map between categories. The important thing is that functors preserve structure (in an abstract sense): not only are objects mapped, but arrows as well. As we will see, analyzing how arrows are mapped is a powerful tool. [[Further discussion TBD. Introduce commuting diagrams. Talk briefly about several concepts: isomorphism, natural transformations, limits, adjoints [canonical constructions], colimits, and comma categories.]] [[Not clear these are really needed.]] Approach:
Category TheoryA category is a collection of several things: objects, arrows, and composition operators. Many familiar mathematical objects have a category associated with them.[Examples TBD] The fundamental approach is to define a pair of categories, and then define a mapping between them, known as a functor. A functor maps both objects and arrows in a way that preserves identity and associativity. For example, consider a stack as one category, and integers as another. Glossing over the details, each can be modeled as a category. We will define a functor that maps each stack to its size. In a non-categorical approach, this might be a sufficient description. But note that it only maps the objects, not the arrows. For our stack category, we will consider the arrow push as operating on stacks. The definition of functor requires that we identify a mapping for arrows as well. So, if we push onto a stack, its size increases by 1, so the corresponding arrow for integers is incr. What does this give us that a simple depth function does not? One nice thing is the ability to do diagram chasing. Consider a mapping: S push S'
o ---------> o
| |
down | | down
V V
o ---------> o
I incr I'
This diagram asserts that down(push) = incr(down). It tells us that it doesn't
matter whether we push and then translate down, or translate first and then
use the increment arrow.
The S, S, I, and I' are arbitrary values. We can use this fact to build up complex diagrams by pasting two diagrams together on a common edge: S push S' push S''
o ---------> o ----------> o
| | |
down | | down | down
V V V
o ---------> o ----------> o
I incr I' incr I''
This shows us that
down o push o push = inc o inc o down The proof is embedded in the diagram, so we are spared the tedium of an algebraic proof. Side Note: Categories are object oriented. Note how OO defines objects as both data and functions, and CT as mapping of data plus mapping of functions. FablesTo show the application of the category theory notions, we'll use a few stories as springboards for discussion.Fable 1. The Land Before PrintersIn a faraway land, where computers were invented before printers, a printer merchant came to town. The merchant drew a picture like this: We want to consider the category implied by the diagram. To the merchant, there are only two objects, Online and Paper. To the townspeople, there is an object corresponding to each document. A simple attempt to identify the category reveals this misunderstanding. Category theory is "typed"; it requires explicit identification of the domains. A similar analysis applies whenever crossing domains. Does a translator from C++ to C promise to produce merely some C program for a C++ program, or does it promise to produce a particular C program for a given C++ program? The diagram looks the same, but the analysis level can matter greatly. Fable 2. Down Below the OceanA wizard came to the capital and bragged, "I'm the last survivor of Atlantis; nothing is left except me and my pen. As I'm the only person left who can read or write Atlantan, and it is everywhere admitted to be the best language for expressing laws, let me translate your laws into that glorious tongue." The ruler should have learned to read Atlantan (unless the volumes were only meant as a coffee table decoration). This situation structurally matches the previous one: Laws1 o o Laws2
\______^
toAtlantan
The fable suggests the importance of having another arrow toEnglish:
toEnglish
______
v \
Laws1 o o Laws2
\______^
toAtlantan
Suppose the category objects are {Laws1, Laws2, Other}. We want to require
that This is known as a commuting diagram. It is an assertion that all paths for getting somewhere are equivalent. Diagrams can be composed, and very complex paths established. In this simple case, we realize that we have no way to verify the translation unless we have the second arrow. The same analysis applies to printers: if we add a "retype" arrow, which means to re-enter a paper document back online, we have the key notion that lets us see how the townspeople in the first story can verify the problem: they see that the (obvious) retype function is incapable of making the diagram commute. (If we don't trust the printer, we'd better have an online copy somewhere to guide us in making corrections.) Fable 3: Rarer Than a PearlAfter people had been printing for a while, they started noticing a problem: sometimes they would have a paper document for which they'd lost the online version, but they wanted it back online. They had no choice but to re-enter the document manually.We have a discrepancy: the diagram unprint
______
v \
Online o o Paper
\______^
print
applies at the highly abstract level ("is it online or not?"), but does not
apply at the level of individual documents (printing D1, and unprinting the
result, yields D2 rather than D1). But we want to say more; after all, the
unprinter is certainly doing something useful.
When there is a leaky or lossy transformation, there is an opportunity to find a "hidden" domain. This hidden domain may be trivial, or it may not even be obvious, but it provides a point of leverage. Here, the hidden domain might be an "abstract" version of a document. Unprinting a printed document may yield a different document: D1 o o D2
\ ^
\ /
print \ / unprint
\ /
v /
o
P1
but they are in the relation "D2 is what results when you unprint what you get
from printing D1." We can consider a new abstract document AD:
AD
o
^ ^
up / \ up
/ \
/ \
D1 o o D2
\ ^
\ /
print \ / unprint
\ /
v /
o
P1
The up function takes a document to its abstract form. The diagram asserts that&bnsp; up(d) = up (unprint (print (d))). But what are the abstract documents? We'll treat the abstract documents as a partition of the electronic documents: each represents a set. We can construct them by taking the closure of the print and unprint functions. Since unprint(print(D1)) = D2, then D1 and D2 are in the same partition. In our example, the abstract online documents are AD1 = {D1, D2}, AD2 = {D3}, AD3 = {D4, D5, D6}. (We could build a similar structure for abstract paper documents.) [[Side note: This glosses over where arrow starts]] So, up takes a document to the abstract document which contains it. Since we have a partition, there is exactly one such abstract document. We can compare unprinters by the way they partition the documents. One might be so bad it is essentially a "two-state" version. Another might tend to arbitrarily substitute letters; it has a partition for each possible length of document. A good one might only confuse certain words (like "Smith", "SmiTh", and "5mith"); it will have many partitions. This example reveals a guideline: many apparent inverses are not true inverses. When this happens, there may be an abstract domain that can reconcile the differences. Real ProblemsSortTablesDevelopment of the SortTables system provides an example of how categorical notions can drive design decisions.SortTables has a set of records, a current column (the sort order), and a current row (indicated by a cursor bar). One of the transformations available (arrows in the categorical framework) is to change the sort order. It is clear that the order of the records are changed by this operation; but what about the current row? This was not addressed by the original design, but came up during implementation. The three possibilities were to leave it at the same position (rejected as confusing), to put it at the first row in the new order, or to locate the same record as the previous order and put it there. The argument of trying to maintain as much information even across the "functor" won the day for the latter choice. It's interesting that people learning to use the system did not at first realize why the cursor remained on the record it did. They understood that the order was different, but didn't have a model to tell them why the current line was where it was. After a short use, they became accustomed to it, and agreed that it was the proper behavior, as they often wanted to explore items in the neighborhood of a given item when sorted by a different attribute. Character SetsGiven a document in one character-set encoding, how do we display it in another? The SortTables interface has this problem, as do web gateways. (A source document might be in "ANSEL", the target in ISO-Latin-1.)For example, the ANSEL encoding of US-MARC records encodes diacritics as the diacritic mark (a value above 128) followed by the letter. The ISO-Latin-1 encoding uses a single character for the letter with the diacritic. Thus ANSEL "'e" must map to ISO-Latin-1 "é". Unfortunately, not all ANSEL characters have representations in ISO-Latin-1. (For example, the Polish slashed-L is missing.) The question then becomes, "what to do about missing characters?" Two approaches are very common: either ignore the diacritic (which is not acceptable in all languages), or introduce "special translations" for the missing character (e.g., "*L" for the slashed-L). Looking at this problem in the category theory framework reveals why both approaches must be regarded as hacks: it is not possible to map completely back from ISO-Latin-1 to ANSEL, as we have lost critical information. If the diacritic is ignored, we don't know whether to map "L"ISO-Latin-1 to "L"ANSEL or "slashed-L"ANSEL. If the special translation is introduced, we don't know whether to map "*L"ISO-Latin-1 to "*L"ANSEL or "slashed-L"ANSEL. We might decide to live with the second "solution": perhaps "*L" never appears in the source text. Or we can introduce special mapping rules, perhaps involving control characters. (But this reduces readability.) What is the solution? In a sense, there is none. Our domains are incompatible, and something has to give. A partial approach is to map to a union character set. For example, the 16-bit Unicode character set contains all the characters in both ANSEL and ISO-Latin-1. By working in the common domain where possible, we defer the problem. Eventually, the piper must be paid, if we ever must move "back" to ANSEL. If such back-translation is required, we will have to define unambiguous representations for each Unicode character. But the example of the web gateway allows another way out. Some web browsers "know" Unicode, in the sense that they support character entities for Unicode characters. Thus, "Lukasiewicz" might be encoded "ɕukasiewicz". If we must deal with a browser that doesn't understand this much, we might use an even more verbose solution: "<img src="letter/Lslash.gif" alt="L/">ukasiewicz", wherein we display an image of the required character. We gain a lot from mapping to the Unicode domain, because it solves other problems as well. Real systems don't deal with just two character sets: they have M input alphabets and N output alphabets. By using a union character set, we only need M+N mappings (M up, N down), rather than the M*N separate mappings that would otherwise be required. (This is similar to the problem of compiling M languages for N machines.) Other Examples(Needs expanding)
RightPagesThe RightPages system uses optical character recognition and page images to search for and present articles in an online library.Page images are scanned in for display. The cover of the journal is scanned in also. They also attempt to integrate the text of the article. If the text is online, they will use that. If not, they do optical character recognition to try to recover the text. Clearly, OCR is not a consistent process. We already know that it's not clear what document is converged to if it's cycled through printing and OCRing (recall the unprinter example). However, they take steps to avoid that problem: they try to ensure that each document is scanned in only once. Another problem is the inaccuracy of the OCR: they're aware of this too, as their interface encourages visual searching: by displaying the covers of various journals, it encourages people to look in places they already "know". [[Do a categorical treatment first, then discuss problems identified in the system, their (RP) solution, and a CT solution.]] What are the categories? We have a scanned version of a document, and a possible ASCII or OCR version. The ASCII version is clearly a category: objects = strings
arrows = transforms on strings
The scanned image is too:
object = scanned image
arrow = image edit
One approach: OCR
Image --------> ASCII
Another: Image ASCII
OCR \ / copy
ASCII2
ie. convert image to ASCII.
Yet another: IMAGE X ASCII
|
IMAGE
("forgetful functor")
A document is thus two things: an image and some ASCII. They certainly can't be fully derived from each other. What does search produce? It must produce both to enable future manipulation. OCR
______
v \
ASCII o o IMAGE
\______^
print
(especially for OCR - what is recovered doesn't include e.g., font information).
Pen-Based SystemsSome ink .../\__ meant to be a word. What can we say?In the case of the Newton, it supposedly doesn't keep ink form around for later analysis. (That is, handwriting analysis is a mode.) This may be a flaw. Hierarchical Editing in a Linear WorldInclude other paper hereMultiple Hierarchy Document ModelTBDLessonsThe stories and examples above suggest design rules that can be useful to people working with multiple document formats.ConclusionsTBDBibliography
Superseded - OLD PARTS NOT RE-INTEGRATEDWhat is a View?A view is a way of looking at things. To formalize the notion, we will frame views in the language of category theory, a branch of abstract mathematics dealing with the algrebra of functions. We hope this will provide a more solid basis for the use of views, and hope to use this basis to generate insight into the problem of maintaining them.Background and DirectionThe programming language Smalltalk has developed views as part of a general framework for developing user interfaces. This framework is known as MVC, for Model-View-Controller. The Model is a single entity, on which several Views and Controllers may depend. Views are presentations of the model; Controllers allow interaction with (and modification of) the model.Smalltalk factors user interfaces this way to provide intellectual control of the process of software development. As Models tend to be more stable than Views, Views are made to depend on Models rather than the other way around. In our categorical approach, we will be less concerned with which is the Model and which the View, and more concerned with translations between them. Category Theory as InspirationA key idea in this research is that multiple views are important. Category theory is a meta-model, providing a framework for moving between views. We do not want to reject previous document models, but rather to address them through a common framework.This research originally set out to develop yet-another-document-model: to more fully define what it means to be a document with multiple hierarchies. Instead, it has evolved toward a category-theory approach to dealing with documents. Category theory is a branch of mathematics that attempts to provide insight into the question "What is the fundamental structure?" This approach requires identifying and manipulating the categories involved. Most document models have been ones for which an obvious category exists (e.g., sets and graphs). The difficulty in using documents is often not working with them in their intended domain, but rather moving them between domains. A category theory approach can provide insight into these difficulties and perhaps suggest resolutions. Like [Mayfield etc.] we adopt the notion and notation of categories, but don't develop new theorems in category theory. Rather, we use category theory ideas to illuminate some dark corners in document models. We will use some parables to demonstrate the utility of this approach in small examples. Then we will apply it to several extended examples, including .... |
|
Copyright 1994-2006, William C. Wake - William.Wake@acm.org |