Document Management and Web Technologies: Alice Marries the Mad Hatter*

V. Balasubramanian Alf Bashian
E-Papyrus, Inc. Merrill Lynch
bala@e-papyrus.com abashian@njaost.ml.com

Table of Contents


Introduction

Our experiences designing and developing a Web Information System (WIS) were reminiscent of the story of Alice's Adventures in Wonderland [5]:

The Hatter went on in a mournful tone, "And ever since then, time won't do a thing that I ask! It's always six o'clock now." A bright idea came into Alice's head. "Is that the reason so many tea-things are put out here?" she asked. "Yes, that's it," said the Hatter with a sigh: "it's always tea-time, and we've no time to wash the things between whiles." "Then, you keep moving round, I suppose?" said Alice. "Exactly so," said the Hatter: "as the things get used up." "But, what happens when you come to the beginning again?" Alice ventured to ask. "Suppose we change the subject," the March Hare interrupted.

If you have ever tried developing a document-oriented WIS, you probably have an inkling of what this conversation has to do with this paper. Read on and you shall find out more...

Since 1995, Web-based Information Systems (IS) have become common place in corporations and academic institutions. Designers of these systems face the need to resolve issues surrounding authoring, organizing, managing, and delivering large amounts of unstructured but timely information via the Web. We believe that the advent of a number of so-called "easy-to-use" Web authoring and management tools have trivialized the serious effort involved in developing a WIS. It is more important for WIS projects to emphasize careful planning and systematic design methodologies than traditional IS projects.

Recently, researchers have described how the Web should embrace third and fourth generation hypermedia features in order to be a "true" hypertext system as envisioned by pioneers in the field. In the absence of a total re-architecture of the Web, IS designers are faced with integrating a myriad of "third-party" products with the Web to satisfy various requirements. In this paper, we describe how document management facilities were used to offset some of the inherent deficiencies of the Web in order to design and construct a large-scale content authoring and publishing system. This system delivers product and services marketing information to financial consultants via an Intranet. However, the marriage of mature document management functionality (Alice) and the more uncontrolled and ever changing Web technologies (the Mad Hatter) came with its own set of problems and compromises. Our challenges included educating users, managers, and developers about the realities behind creating such an integrated system.

We begin with a discussion of the requirements for the new system, followed by the deficiencies of using Web technologies alone to satisfy these requirements. The rationale for integrating document management functionality and the Web is presented. We provide an overview of the system architecture for authoring and publishing large-scale multimedia content. Challenges and issues we faced in terms of usage and management of the new system and its resulting processes are described. We also present some supporting work in the areas of hypermedia and document management for the Web. We conclude with lessons learned and pointers to developers, users, and managers.

Business Requirements for New System: Wonderland

Merrill Lynch is a financial management and advisory company with a global presence, providing financial management services to millions of households and businesses. Serving clients is a network of approximately 14,000 financial consultants and support personnel located in more than 550 sales offices worldwide. As part of a major business initiative called "Trusted Global Advisor" (TGA), Merrill Lynch is focusing on replacing its mainframe, text based information systems with client-server and Web based systems that are integrated under a single Graphical User Interface (GUI) Shell. Details of the TGA Shell are described in [8]. TGA has given internal marketing groups an opportunity to deliver product and services marketing information in a variety of formats and media to financial consultants via an Intranet. Subsets of this marketing information may also be delivered to other audiences such as clients and the general public.

The following requirements were presented to the systems development team:

These requirements were similar to those specified for an industrial strength hypermedia system for an engineering enterprise as reported in the classic paper by [10]. We were challenged with the same kinds of issues such as interactive authoring, templates, composites, object attributes, navigational aids, access control, version control, concurrency control, query mechanisms, interoperability, and collaboration.

Integration of Document Management and the Web: Alice Marries the Mad Hatter

The Web was an obvious choice to deliver multimedia material in an integrated fashion. Hypermedia functionality, as available on the Web, enables the structuring and linking of related material. However, the traditional Web model of authoring and managing documents in multiple media on a server was insufficient to meet requirements. Since we expected well over 10,000 documents a file system alone would have been quite inadequate. Based on early prototypes, we soon realized that our goal was to develop a low-maintenance IS using the Web as a delivery vehicle. We also saw that it is easier to design and construct data-oriented WIS using highly trained technical staff than to construct self-sustaining document-oriented WIS. An emphasis of this project was to delegate and disperse responsibilities to various stakeholders in the organization. Marketing departments, products and services groups, editorial, legal, and technical staff would collaborate through a centralized document repository. We felt that a distributed yet centrally controlled environment would minimize the need for a large pool of supporting systems professionals.

Due to the highly unstructured nature of the material, the use of a relational database management system (RDBMS) was also insufficient. For example, information about products and services such as mutual funds, cash management accounts, etc., are highly graphical and customized in nature. Different products have different topics and we could not arrive at any generalized data model. While an RDBMS is good at managing relationships, it is not good at managing documents unless an application is built around it to do so. We found that many of the requirements could be addressed only with document management functionality. After reviewing a number of document management systems, we chose Documentum¹ since it was closest to meeting our needs. Documentum is an object-oriented client-server system, residing on top of a relational database, facilitating the storage, import, export, management, and retrieval of documents in multiple formats. It also has facilities to store documents as components, assemble them into different views, provide privileged access to authors, and institute workflows.

At the time we started the project in early 1996, no mechanisms were available to integrate document management and Web technologies. We were challenged with building a hybrid system, which would combine these so as to leverage the strengths of both. What we did not realize was that integrating these technologies would be a daunting task. This was not a marriage made in heaven. Table 1 below shows who brings what to this 'marriage'. The deficiencies of the Mad Hatter (Web) were compensated by the mature and stable Alice (Document Management). Conversely, Alice did not offer what the Mad Hatter did in terms of distributed, platform-independent delivery of multimedia content.

Table 1. "Who Brings What to the Marriage"

Document Management (Alice) Web Technologies (Mad Hatter)
Manage large amounts of material Deliver multiple media
Provide consistent and predictable structure Provide user interface and navigation
Enable hyper-linking
Ensure currency
Facilitate non-technical authors with templates Facilitate non-technical authors with WYSIWYG tools
Support roles, responsibilities and access control
Enable workflow
Publish multiple views
Enable version control
Provide document locking
Enable recording of attributes Enable attribute searching using meta-tags
Stable, well-defined functionality Continuously evolving

System Architecture, Challenges and Issues: The Newlyweds

Researchers have urged MIS departments to observe the following in building successful WIS [2]:

Existing systematic hypermedia design methodologies were inadequate to design such a large-scale document-oriented authoring and publishing system. We extended RMM [9] to address our needs. Our development methodology consists of seven iterative stages, namely, information architecture, user interface and navigation design, content creation and authoring, workflow and document management, publishing, document review and link management, and search and retrieval. Details of this systematic design approach and software architecture are presented in [1]. We also conducted informal evaluation studies of the authoring, browsing, and retrieval user interfaces. Below we describe the highlights of the software architecture (refer to Figures 1 and 2) and present the challenges that we faced.

Figure 1

Figure 1. Software architecture of the authoring and publishing system.

At the heart of this architecture is a centralized document management system (DMS). Based on roles, customized client interfaces to the DMS have been created for various stakeholders in the organization. Even though the client interface to the DMS was graphical, it still required significant customization and training. Some concepts of document management are not common knowledge. Privileges and appropriate access control lists (ACLs) have been defined, providing authors secure access to the objects to which they were entitled. Administrators define the structure of product information with composition templates while authors create content within that structure. The challenge here was in defining roles and ACLs without becoming too complex. We had to strike a delicate balance between strict control and flexibility. Some explicit roles became blurred as business areas found it difficult to accept some mutually exclusive role definitions.

To begin, we used a sample financial product area, and prototyped the organization and information structure of its products (an activity called "information mapping"). We kept a 'flat' organizational structure at a physical storage level for all products, but logically they could be organized in any way. Hyperlinks allowed the logical organization of information to be different from the physical structure. This simplified authoring and linking as well as facilitated the automation of the publishing process. Although we were able to support any logical hierarchy, business areas had difficulty coming to an agreement on a default structure because they perceived that categorization could restrict a product's market.

We also prototyped the user interface and navigation mechanisms. Product information was defined as being made of components that could be authored separately by resident experts. For example, information about a mutual fund is composed of components such as Description, Client Suitability, Client Benefits, Performance, Sales Charges, Phone List, Risk, Marketing Materials, Sales Ideas, Competition, etc. (Figure 2). However, other products need not have all the same set of topics. While the business liason insisted on a consistent information structure and interface, product marketing groups wanted more autonomy and creative license. There was a continuous debate between financial consultants requesting a product reference guide and product marketing groups opting for a Webzine.

Figure 2

Figure 2: Component-based authoring and publishing.

For a large generic class of products, a technical team created HTML component templates with inline authoring instructions included as comments. Authors work with these, using WYSIWYG authoring tools without having to learn much about HTML. However, due to the infancy of WYSIWYG HTML authoring tools, we could not provide “iron-clad” templates as requested by the business team. Authoring tools need to be enhanced to support more error proof templates similar to what is provided by most popular word processing programs.

HTML authoring tools were also not truly WYSIWYG and were continuously being modified and enhanced making it difficult for us to select a standard tool. Consequently, we had to resist the temptation of trying out new and fancy Web tools that were quickly coming to market. Though we provided an HTML extension to a familiar word processor, authors still required significant hands-on training. Varied skill levels of authors and resistance to change existing work practices posed a number of problems. In addition, there were backward compatibility problems between versions of the same tool, only months apart, as well as major incompatibilities between different vendors' tools. In the end, it became necessary to include a process in publishing, which would strip out unwanted tags and perform some minor formatting adjustments.

The authoring and publishing cycle begins when an author checks-out a pre-defined component of a particular product. After editing, the author checks the document back into the DMS, assigning it an appropriate version number. The DMS provides 'locking' and 'unlocking' facilities (check-out/check-in) to prevent two or more people from working on the same document. Authors and system administrators have access to older versions.

Authors can create links from their assigned products to other related products. We minimized dead link problems by providing a Web interface to a dynamically generated list of all products and services currently created and enrolled in the system. The author can browse through this list, simultaneously displaying each item’s contents in a different frame. The relative URL to be used for the link appears in a fixed location and can be cut and pasted by the author into the authoring tool at the appropriate content location. Authors found it difficult to understand the concept that documents must exist before links are created. This is a non-linear activity and it is not the way people are trained or accustomed to writing and presenting content. As the product information base grows and as authors receive more training, this problem should be somewhat alleviated. Still, all this relies heavily on linking methods and procedures to be followed by the author. Instead, if link verification were an integral part of the authoring tool, there would be less room for error.

Each component or document in the DMS gets a standard set of attributes such as Title, Keywords, Author, Creation Date, Modified Date, Version Number, etc. During the check-in process, the author defines additional characteristics on the components such as their suitable target audience, etc. This is later used to guide the assembling of different views for different audiences. Theoretically, the creation of multiple views was desired to avoid having to re-author marketing text to provide it to different readers. As it turned out, audiences were so different that authoring once for all readers was not appropriate. However, this feature may be used later to publish a printed version different from an online version.

Using the DMS, the author also defines product-level characteristics such as product category, client suitability, risk level, etc. These are later used to embed meta-tags in the published HTML for the purposes of search and retrieval. The initial list of these applied well to a majority of products, but became irrelevant for some others. For example, an equity may have a stock symbol and a risk attribute, but these are not applicable to an account service. As the attribute list grew to cover all instances, it became difficult to control and provide meaningful search results.

An author previews work by publishing to a Staging Web server and using a standard Web browser. The publishing process generates the necessary user interface and global navigation components thereby maintaining consistency. The top frame with title and global navigation elements and the left frame with local navigation elements (shown in Figure 2) are automatically generated. The assembled compound document of components relies heavily on a particular feature of the chosen DMS called the Virtual Document Manager. The assembly of an appropriate view (body frame of the layout shown in Figure 2) is constructed by combining components based on the value of the target audience attribute of each component defined by an author. The documents are then "pushed" to an appropriate location on the Staging Web server. Versions of these generated documents are also stored in the DMS for future reference, thereby satisfying one of the primary legal requirements. Efforts to standardize the user interface at the corporate level required us to change it a few times. This was easily accomplished by making minor changes to our publishing procedures, without having to rewrite all documents by hand. All product documents can be re-generated easily to reflect these changes.

Once an author is satisfied with his or her work, he or she sends notification to a group of editors with the URL of the staged document via e-mail or through a router available from the DMS interface. An editor looks up the URL using a standard browser, either reads the document online or prints it, makes necessary comments and sends it via email back to the author. The author makes the necessary changes, notifies the editor again, and the editor forwards the document URL to the attorney. If the attorney is satisfied, the editor is notified who in turn notifies the author. The author then requests the content administrator to promote the document for release to the Production Web server, which is accessed by the financial consultants. Business areas had difficulty describing their desired workflow. There was resistance to change existing workflows. Editors and attorneys were not very enthusiastic about reviewing and approving content on line. Additionally, attorneys were supported by secretaries and relied on handwriting for authenticity. Also, problems arose due to incompatibilities in network connectivity and email software for all the appropriate parties.

The documents on both the Staging and Production Web Servers are reviewed or “tested” by system administrators. Both servers also execute link verification engines to detect any dead links. The chances of dead links are minimized by the fact that authors can link only to those documents, which exist in the system. However, when a document is already linked to another document and the latter is deleted, there is no automatic notification mechanism to the author of the former document. Hence, link verification is still necessary. However, this link verification is “after-the-fact” and we will later describe some of the limitations of such an approach. Also, both servers have search engines running as daemon processes, which continuously index content. This enables users to retrieve content using attribute-based and full-text searches. This completes the authoring and publishing cycle. Periodic notifications are sent to authors to re-examine content for currency.

Both the user interface (formed of generated HTML pages) and search interface have been evaluated through informal usability studies, but our results were mixed. While some financial consultants were familiar with the Web metaphor and could easily navigate through the information space and accept some of the limitations of the interface, others expected the system to function like a typical event-driven client-server application. We also saw that usability specialists needed some background on the deficiencies of the Web browser as a user interface. Overall, they were also not happy about the precision and accuracy of information retrieval through attribute-based and full-text searches. Authors have been trained on the client interface to the DMS. Usability tests on this have not been carried out. However, weekly feedback sessions and focus groups, called “The Authoring Cafe”, are held between technical staff and authors to exchange notes about the authoring interface and tools.

Some authors were overwhelmed by training in a number of areas simultaneously such as information mapping, authoring guidelines, tools, client interface to the DMS, and adapting to new work processes. Delays due to designing, developing and deploying such an integrated system had quite an impact on the organization. The desire to be “out there” as soon as possible, drove some of the well-funded marketing groups to break away from a centralized model towards a more 'Webzine'-like approach. Product marketing groups find it less attractive to “give in” to centralized control of content. These groups have decided to adopt their own tools for authoring and version control. It remains to be seen how they fare with internal regulatory staff, corporate standardization, and integrated searching.

While using the DMS, authors and marketing groups can update and publish Web documents as information changes. Primarily, we adopted a static publishing model. It is based on "pushing" information from the DMS to the web site since much of the content is static and must be legally approved. However, information about performance, telephone lists, commissions, etc., are derived from dynamic data sources which became difficult to integrate with this publishing model. We evaluated a Web-enabled document management product that utilized a dynamic publishing model, but found that the features of this new product did not adequately address our needs.

Related Work: Supporters of the Marriage

Some prominent researchers [2, 3] have compared existing Web infrastructure to second-generation computing languages. They have recommended the incorporation of third and fourth-generation hypermedia features into the Web. They have identified the need for typed nodes and links, structural query mechanisms, transclusions, warm and hot links, computed and personalized links, access control, global and local overviews, navigation and backtracking facilities, trails and guided tours, external link databases, and annotation facilities.

We have stated earlier that in addition to lacking classic hypermedia functionality, the Web is lacking in other equally important issues pertaining to the creation, management, delivery, and retrieval of documents. This concern is also shared by [11] who have described how document-based repositories have been “less-than-successful” due to inadequacies of the Web infrastructure. They have identified the need for enhanced functionality to manage collections, to support document management functions such as access control, structural and content version control, concurrency control, change notification and to integrate link management functions within the Web server. They too have called for the integration of document management functionality into the Web infrastructure.

Lessons Learned and Pointers: Divorce is Not the Answer

Although the Web has simplified information delivery, tasks such as building the authoring environment, managing documents, and relationships between them are not easy to accomplish without a large technical staff. Managers and developers can easily get misled by so-called easy-to-use Web based tools which convey the impression that look and feel is everything about developing a WIS. The advent of these tools, which are often in “beta” mode, has trivialized the need for systematic design methodologies, controls, and management issues that are a must in any medium or large-scale IS project. The real challenge is in managing information on a regular basis so that both content and links are up to date. If enough planning and design does not take place, organizations can quickly walk into document maintenance nightmares. Since users place more expectations on the Web in terms of currency of information, it is just as important to have a well-planned, well-designed and tightly controlled environment to support their needs.

Link management should become an integral part of the authoring and publishing environment [4]. There should be instantaneous recording and verification of links in a link database. Similarly, when a link is deleted, the corresponding entry from the link database must be removed. In addition to ensuring 100% link integrity, this would also eliminate the need for manual intervention, facilitate visualization of the information space, enable link change notifications, and provide link traversal privileges.

According to [11], Web technologies need to support existing work processes just as work practices need to adapt to emerging Web technologies. We believe that both sets of changes are required in order fully to enable collaboration among distributed groups. Normally, a new brochure on a financial product is issued every six months out of which most of the time is spent on the creation, review, and publishing phases. Document management technologies with workflow facilities, combined with Web technologies, can re-engineer these complex and time-consuming phases thereby increasing the effectiveness of marketing campaigns. However, we have seen resistance to change existing work practices either due to organizational culture or due to costs involved in having a common technology platform across various departments. We have seen that development of a WIS is not just a technological issue; it is also an organizational, political, and cultural issue. In spite of our careful planning, adopting systematic design methodologies, and integrating a variety of technologies, our initial perception is that the system is "less-than-successful" [11] due more to these non-technical factors.

While a number of requirements specified by [7, 10] and us can be addressed by integrating a wide variety of tools [6], we still do not have a totally seamless solution. We believe that the best approach is to integrate document management and hypermedia functionalities into the existing Web infrastructure in order to address a majority of the issues mentioned. This would help to avoid the myriad of tools and techniques that are being proposed, almost on a daily basis, by software vendors. This proliferation creates an interoperability nightmare for application developers and end-users. We are reminded of the incompatibilities created by client-server applications of the late eighties and early nineties. We are also concerned that product-differentiation wars between vendors will result in undermining the very distributed, platform independent, inter-operable nature of the Web, further causing design, construction, and maintenance problems of WIS. We urge designers and developers to keep this in mind when they add new features or enhance existing infrastructures. Just as the Web needs ubiquitous hypermedia support [2], it also needs ubiquitous document management support in order to design and construct large-scale, distributed, industrial-strength, platform-independent Web information systems.

Acknowledgments: We thank the following: Mike Snizek, Phil Gilligan, Dan Porcher, Lorraine Franza, Gail Davala, Luanne Arico, Joe DeFranco, Ray Walters, Marc Harbatkin, Emma Jaffe, Rich Caran, Robert Raud, Susan Hopper, Melenda Moore, Gururajan Rao, Bruce Weimer, Piyush Pandya, Oliver Smith, Paul Kahn, and the team at Dynamic Diagrams, Inc. We thank Paul Kahn for his comments on earlier renditions of this paper. We also thank Dr. Michael Bieber for encouraging us to submit this paper to this special issue.

References

  1. Balasubramanian, V., Bashian, A., and Porcher, D. (1997). A Large-Scale Hypermedia Application using Document Management and Web Technologies, Proceedings of Hypertext '97, ACM Press, 134-145.
  2. Bieber, M., and Vitali, F. (1997). Toward Support for Hypermedia on the World Wide Web, IEEE Computer, January 1997, 30(1), 62-70.
  3. Bieber, M., Vitali, F., Ashman, H., Balasubramanian, V., and Oinas-Kukkonen, H. (1997). Fourth Generation Hypermedia: Some Missing Links for the World Wide Web, International Journal of Human-Computer Studies, July 1997, 47, 31-65.
  4. Carr, L., Davis, H., De Roure, D., Hall, W., & Hill, G. (1996). Open Information Services. Proceedings of the Fifth World Wide Web Conference, http://diana.ecs.soton.ac.uk/~lac/WWW96/Overview.html.
  5. Carroll, L. Alice's Adventures in Wonderland, The Complete Works of Lewis Carroll, The Modern Library, New York.
  6. Dieberger, A. (1996). Browsing the WWW by Interacting with a Textual Virtual Environment - A Framework for Experimenting with Navigational Metaphors. Proceedings of Hypertext'96, 170-179.
  7. Halasz, F.G. (1988). Reflections on NoteCards: Seven Issues for the Next Generation of Hypermedia Systems. Communications of the ACM, 31(7), 836-855.
  8. Hopper, S., Hambrose, H., and Kanevsky, P. (1996). Real World Design in the Corporate Environment: Designing an Interface for the Technically Challenged. Proceedings of CHI '96, 489-495.
  9. Isakowitz, T., Stohr, E., and Balasubramanian, P. (1995). RMM: A Methodology for Structuring Hypermedia Design. Communications of the ACM, 38(8), 34-44.
  10. Malcolm, K. C., Poltrock, S. E., and Schuler, D. (1991). Industrial Strength Hypermedia: Requirements for a Large Engineering Enterprise, Proceedings of Hypertext '91, 13-24.
  11. Rein, G. L., McCue, D. L., and Slein, J. A. (1997). A Case for Document Management Functions on the Web, Communications of the ACM, 40 (9), 81-89.

Authors

V. Balasubramanian (bala@e-papyrus.com) is a research-oriented practitioner. He heads his own consulting firm, E-Papyrus, Inc., specializing in hypermedia design, Web-based information systems, user interface design, document management, and workflow applications. He holds a Ph.D. in Management from Rutgers University.

Alf Bashian (abashian@njaost.ml.com) is a Systems Architect and Assistant Vice President in the Advanced Office Systems and Technology group at Merrill Lynch in Princeton, NJ. He holds a Masters Degree in Computer Science from Pace University. His experiences over the last 15 years include developing client-server applications, Web based information systems, document management and workflow.


* Both authors contributed equally to this paper.

¹ http://www.documentum.com. Documentum is a registered trademark of Documentum, Inc.