Introduction

Argument for fragmentation

A structured document explicitly encodes part of its semantics. As a result, the perceivable representation of the documents can be semi-automatically adapted to multiple supports, such as, for example, a website or report. Compared to a traditional scenario, where an author creates both versions as separate documents, the authoring and maintenance effort can be reduced significantly

Although the content of the respective versions might be similar for the larger part of the documents, typically they are not identical. The traditional way to resolve this is to include all content in a single structured document and include, at generation time, the relevant content for the selected context. The problem with this approach, from an authoring perspective, is that the structured documents can easily become cluttered and difficult to manage. Scenari adopts a slightly different approach by representing a structured document as a hierarchical network (DAG) of document fragments. An author includes, at authoring time, the relevant fragments by reference. Since a fragment can be referenced by other fragments, an author can re-purpose the same fragment for multiple versions. The advantage of this approach is that an author can adopt a mindset of authoring a single support, which is more intuitive for most users.

Fragmentation extends re-purposing

Fragmentation primarily facilitates the management of multiple supports for a documents. However, it also allows for re-purposing of content between multiple (independent) documents, which is a relatively under developed aspect of document engineering research. Traditionally, re-purposing content is realized by copying the content into a new document. For example, a text that summarizes the activity of an enterprise can be re-purposed for its website, the annual report, job announcements etc. However, copying the content has the disadvantage that copied fragments live on independently. Consequently, if the enterprise changes its director, and the enterprise summary is to be updated, all copies need to be updated manually. In contrast, when the fragment is re-purposed by including a reference to it, all summaries are up-to-date with a single edit action.

Fragmentation facilitates document (re)structuring/management/editing

The fragmentation paradigm allows for a number of authoring conveniences that improve the management of large documents in particular. Notably, the hierarchical structure of a document (e.g. chapter, section, subsections), which is important for the discourse, is explicitly represented by the network of fragments. This allows an author to obtain an overview and conveniently structure the document. Furthermore, while editing, an author typically moves content to a different location in the document where it is more appropriate. In a traditional editing environment this is typically realized with a copy-paste operation. However, this operation can be cumbersome, time-consuming and error-prone when moving large pieces of content. In a fragmented document, content is included by reference, consequently moving content corresponds to moving the reference, which is lightweight operation.

The hypothesis: Structured documents/authoring and Fragmentation reduces collaborative authoring overhead.

Structured authoring and fragmentation, in general, reduce the authoring effort, which is indicated by the popularity of the Scenari authoring environment. A natural extension of the fragmentation paradigm (and Scenari authoring environment) is to include support for collaborative authoring.

We consider two types of collaboration:

  • A group of authors working on a common document

  • A group of authors that share fragments between multiple documents

We think that structured authoring and fragmentation will reduce the authoring effort in a collaborate environment

  • by reducing collaboration overhead, notably the effort spend on aggregating and integrating contributions into a common document

  • by re-purposing shared fragments, notably in a project-like environment where project partners share content

Anticipated challenges with fragmentation (in a collaborative environment)

Fragmentation may address a number of problems typical for collaborate authoring, however, it also imposes complications, which is one of the primary research topics in C2M:

  • Re-purposing - Consider an enterprise summary that is re-purposed by multiple documents. If the summary is modified, by including the revenue, this automatically updates all documents that reference the summary, even if it might be less appropriate, such as a job announcement.

  • Rights management - Another complication is rights management when there are multiple authors for a document. For example, an author is allowed to edit the enterprise website, but not the annual report. A conflict arises with the summary as it is shared by both documents.

  • Versioning - When two authors collaboratively work on a document the typical situation may arise that there are two versions that should be merged, incorporating contributions of both authors and signal when there is a conflict between two versions. Traditional versioning tools do not consider fragmentation and can therefore not be used as is.

  • Archival - Similar to versioning, traditional archival solutions do not consider a document as a network of fragments. Therefore, new mechanisms may be needed to ensure that a document remains accessible over time.

Research questions

  1. Does fragmentation of structured documents reduce collaboration overhead?

  2. What are the bottlenecks due to fragmentation and re-purposing in a collaborative environment?

Research methodology

To answer above questions we adopt the following approach: In section Use-cases we describe a number of scenarios that illustrate the required functionality. Based on these we derive functional requirements for a system that supports collaborative authoring of fragmented documents. The subsequent section ScenariWiki introduces the systems Scenari-Nuxeo and ScenariWiki, which implement the requirements derived in the prior section. Both systems are evaluated in a user-evaluation study, which is described in section GSU pilot