We are happy to announce that Wrestling Legacy Data to the Web & Beyond: Practical Solutions for Managers & Technicians is available for ordering.

Click here to order!

Please respect our copyright!

From
Wrestling Legacy Data to the Web & Beyond: Practical Solutions for Managers & Technicians

Chapter 1: The Battle Cry!

You’ve heard the questions from managers, consultants, and industry pundits:


“Why can’t we re-use the data we already have?”

“Why can’t we view all of our documents on the web?”

“Why can’t our applications talk to our PDAs and cell phones?”

“ Why can’t I print anything, anywhere?”
 

And, the dreaded:

“What can’t we look at our data anywhere like they do on TV?”
 

 

The answers usually involve the words migrate or transform, which become the battle cries heard in every industry from banking to agricultural engineering. Those words, migrate and transform mean something different to everyone who uses them. For most companies it means making legacy data available beyond its typical paper delivery system, and that is not a challenge for the faint of heart.

We are going to take a careful look at all of the factors that go into a successful marriage of legacy print applications and delivery of the legacy print data using alternative means, including the web, PDAs, and even electronic ink-based devices. When you reach the end of the book you should know what you need to know to ask intelligent questions of your vendors and colleagues, as well a where to go for more help. You should even be able to interpret the answers they give you!

The simple facts in the world of corporate high-speed printing is that not everyone who works with the printers and print data has all of the terminology for all of the components of their original format or their target format at their disposal. We are going to try to help with that, too. First, we use the term resource throughout the book. The term resource is not universally understood, however. To be clear, we mean the fonts, forms, graphics, and print environment files that go into controlling the print job.
These resources are critical. And, no matter how many times you ask, most people do not know what resources are required for a job…even if it’s in production.

What does it mean to work with legacy data and bring it to alternative platforms? Engineering the solution is different for every company, but the mechanics and tools are remarkably similar. The hundred-thousand-foot overview is that you’ll need to uncover the deep dark secrets of all of your applications, then try to locate the best solutions for moving both the applications and the data they work with to the new platforms and delivery vehicles.

All the while, you’ll be battling look and feel issues. When you move to the web:

  • • How close does the web version have to be to the paper version of the document?
    • Can you re-design for new delivery devices, or do regulatory requirements
       force you to duplicate the paper version of your documents?
    • How small is the type?
    • What do you do about color?
    • How large are the graphics?

Then, you move into even more interesting questions. What happens when you move to even smaller display devices like cell phone screens, PDA screens, or even purpose-built devices? What are your liabilities? What are your responsibilities?

Beyond the look and feel of your applications, there are the security issues and access issues that raise their heads at every turn.

About the time you think you’ve answered every question and encountered every problem, testing will uncover even more of those deep dark secrets. You will find graphics that were created with proprietary fonts and data that was re-mapped in COBOL, PASCAL, PL/I or even FORTRAN routines that may not move to the new platforms. The routines may be hidden in external procedures and files that hide them from view. You may even find programming written within the print applications using the printer datastream language! The older the application, the more likely it is that you’ll find interesting anomalies. However, do not be surprised if you find them in even your newest applications.

If this sounds like a big job, it is. However, it is not an impossible job. Many companies have successfully moved applications originally designed for Xerox, IBM, generic line printers, and even other proprietary printers to new print environments, to the web, and beyond. They have taken many paths, and many innovations have been forged along the way. So, let’s look at the tools you need to determine how to transform the output from any application so that is available for use on any delivery device, no matter how big or how small, wired or wireless.


The Problem with Legacy Data

Legacy Data comes in a variety of sizes, shapes and descriptions. It may be the output of programs written in-house over the past 30 years, the output of commercial software installed over the past 30 years, or even the output of programs written or purchased within the past few months.

Within the enterprise, it may have many personalities, especially if the corporate culture permits departments to build or buy their own application solutions. Some departments may have grown from departmental systems through mid-range systems like the IBM office systems, System/36s and AS/400s on their way to network-based printing. Others may have selected office systems from companies like Xerox, Datapoint, Data General, and Tandem, while others remained on paper until the advent of the networked PC. And, there are those who are at home on the big iron - developing all of their support applications on the IBM or IBM plug-compatible hosts.
Regardless of the pedigree, legacy data poses challenges. It is the result of the business of getting enterprise information, such as invoices, statements, specifications, policies, reports, and just about every other type of business document, through the application and print process. It may be simple line data, consisting of little more than the actual text to be printed and some page eject commands, or it may have evolved to a more sophisticated form of line data that includes font calls, inserted graphics, calls to forms that overlay the data, and re-organization of the incoming data.

Beyond basic line data, there are the complex additions to line data formats, most commonly using IBM’s Advanced Function Printing/Presentation (AFP) and Xerox Dynamic Job Descriptor Entries (DJDEs). The syntax for using AFP commands and DJDEs have evolved over the years, so you may find inconsistencies in coding and old syntax in your data.

Take another step and you meet the more complex forms of AFP that include composed data representations with a myriad of variations. Xerox print applications can have the appearance of full application programs or they may use Xerox’s proprietary Metacode print datastream instead of or in addition to line data marked up with DJDEs.

Don’t forget the data formatted for printing on PCL printers (from Hewlett-Packard or other vendors) and the data created for printing on PostScript devices or viewed in Acrobat using the Portable Document Format (PDF).

Over time, all of these formats and languages have changed. Subtly at times, and dramatically at others, they have grown to accommodate the evolution of the languages and the devices they support. The programmers assigned to the print applications may have made changes to the base applications or written bridge code to reformat incoming data. As you can guess, they may not have documented everything they did in the rush to meet application deadlines. These things will make re-using the existing applications more challenging. This is the problem with legacy data.

A Glance toward Your Applications

Even after you identify the print characteristics of your applications, you will face the challenge of identifying the applications that produce the print and getting that old data into a format appropriate for your new output medium. Most large enterprises must still print most of their application output at some point in its life, and that print generally revolves around print formats like IBM’s AFP, Xerox DJDE/LCDS, line data of all varieties, and the original desktop formats like HP PCL, Adobe PostScript and Adobe PDF.

Those applications generally use fonts with varied histories. Some of the fonts were installed with the printers and their basic programming applications as far back as the 1970s, when they were designed for use on lower resolution print devices.

One of the big problems with legacy data revolves around the fonts used to develop the original application. There is a lot more detail about font issues in later chapters, but here we want to shine a light on font basics. Without a clear understanding of what fonts the data you have expects to use, it is easy to make poor decisions about how to handle font decisions for the new delivery environment.


Why Care About Fonts?

The fonts that you use with your legacy data are files with information about how to map the data to a visual representation of the data. That representation is often specific to the hardware used for printing. When you try to migrate to new and exciting output devices it can be difficult to cause the data to appear identical to the original print because of the variations in the font file formats, their internal architectures for building the character images, and how they handle the white space between characters and between the lines.

If you are old enough to remember typewriters, you might remember the difference between Pica and Elite typewriters. If you typed a document on a pica typewriter, you saw different line endings than you saw if you typed on an Elite typewriter. Pica and Elite typewriters each use a different number of fixed characters per inch. Those problems are multiplied a thousand-fold in the world of legacy migration.

Prepare to make decisions regarding what fonts are critical to the look and usability of your documents, which have legal requirements associated with them, which have corporate branding issues associated with them, and which can reasonably change without requiring a vote of the corporate board of directors.

The Look of the Document

While one of the problems you encounter is older fonts, another potential problem area involves the graphics that populate your business documents. You may have corporate logos and product branding icons associated with processes and functions, or even signatures or text blocks that could not be rendered using a real font. Each of the possibilities has has its own challenges.
One issue with graphics involves resolution. If you take a 300 dot per inch (dpi), 480 dpi, or 600 dpi graphic and put it onto a screen with an average equivalent resolution of 72, 90 or even 120 dpi and a completely different color model, the results may not be acceptable. Does that mean every graphic will need a makeover? It is definitely a possibility, but a lot depends on how the original graphics were created, if source files are available, and how much fidelity to the original is required. We look at all of this in a later section, but here we want you to understand that there will be a lot to consider.

The file formats, including fonts, graphics, electronic overlays/forms, and other resources used to create your print, are only a part of the picture. They are the biggest part, but not the only part. There is also the question of the real estate available to present your data, how it is oriented (tall/portrait or wide/landscape), and how you want to handle the navigation of the document in its new form. These are big questions because a screen of any size is not a piece of paper.
Think about that for a moment. When you look at a screen, the proportions of height-to-width are closer to a landscape page than a portrait page. Much of the information we try to display, however, originates as a portrait printed page. From the start, we are going to have a challenge. You have to stop thinking in terms of 8.5 inches by 11 inches of paper real estate (or 210mm x 297mm for A4) and begin to look at the information presentation to determine the best way to migrate.

Once you understand what you have, you will move on to the process of determining the best tools for making your data available in alternative environments. There are as many approaches as there are enterprises around the world. Some of you will need a one-time conversion of all resources and programs to accommodate on-the-fly publishing to paper or screen, while others will favor an approach that reconstitutes the data for the target alternative output devices using batch transform programs. Still others will find combinations of batch and WYSIWYG (What You See Is What You Get) products, including print drivers, batch transform products and import/export schemes through document creation tools, that create the best environment.
Look at proven methodologies, whether you decide to do it yourself using in-house staff and resources, contract out to a service company, or purchase a solution from a vendor. Remember that proven in this case should mean that someone has used the tools to migrate or transform data that is similar to the data you are working with. This is an important point because there are so many variations in data formats, data management methodologies, and application tools.
How can you tell if it is a proven method? To start, ask potential vendors questions about who is using their method and what type of analysis was done before implementation. Be wary of anyone who tells you that it wasn’t necessary to do any analysis or anyone who tells you that their method works regardless of the type of data you have. Potential solution vendors should be asking to see your data and the resources that support that data. They should be asking you about what creates it and how the data is stored. And, finally, they should be offering to run tests for you or give you enough access to their applications to run a test for yourself.

What Can You Re-Purpose?

For most companies this legacy data is the foundation of the corporate information database, which means that it has to be handled as carefully as possible. The starting point for working with it should be the identification of the applications that produce output, whether that output is printed directly from the application or passed to other programs for enhancement, re-purposing, printing, or storage.

Can any application be re-purposed for the web or some other alternative device? While you will always find situations where the cost to re-purpose the data would not be worth it, the answer is that any application can have its output re-purposed. Some applications will be easier than others, and some just shouldn’t be moved. The only real requirements are that, at some point, you need access to all of the input, all of the output streams, and a way to describe and reformat the output for the new target delivery environments.

While we will talk more about moving legacy data to the web or web-enabled/web-friendly environments, the procedures we walk through in the forthcoming pages will work regardless of the actual target output environment, including cell phones pagers, and electronic ink devices.
Looking through your business applications you should find business-to-business deliverables, customer-deliverables, and internal deliverables. Some of these applications generate paper invoices, bills, and other documents that might be candidates for migration. To keep the job manageable, the best place to start is at the output end of your system applications. Start with the printers. Anything printed on green bar paper or its equivalent, as well as anything printed on a form (electronic or pre-printed) is a candidate for migration to online delivery.


What About Composition Tools?

In this book, we are concentrating on the output of the applications and tools you use, not the tools and applications themselves. That would be a separate book. However, there are a few things you will want to know about the applications you use as you create your inventory lists and do your audits.

Composition tools are that class of applications that cover mainframe and PC-based products used to add formatting information to text blocks. Composition tools come in two basic flavors: tagged/batch and What-You-See-Is-What-You-Get (WYSIWYG).

Batch tools are most commonly found on the mainframe. The most popular batch composition tools remain IBM’s Document Composition Facility (DCF)/SCRIPT and it’s extension called BookMaster, and a product from Document Sciences, Inc. called CompuSet (formerly XICS). You might also find Waterloo Script, which has a lot of the same history as DCF, but was developed by the University of Waterloo in Canada, and so has some different features than IBM’s product. For example, it directly supported most Xerox printers early in its product cycle, while DCF requires third party add-ons to produce output to Xerox printers.

Most these products owe their existence to Dr. Charles Goldfarb, a researcher at IBM who defined the method for making content and formatting independent of each other. His original application was for legal documents, but the methodology he defined worked for business documents across the board.

In all of these products, there are sets of tags or controls that authors or document formatting specialists add to the text to cause formatting (and sometimes other processing) to occur. The tagged text files are then composed and the result is a print file.

If you still have these products in use, and many companies do, you will want to know what version and release you are using, what default fonts are in use with these products, and if you use the output of these composition tools with other applications programs.
For example, both CompuSet (which also has a visual design component) and DCF are often found in the insurance industry working in concert with a product called originally called DocuMerge (now DocuMaker) from DocuCorp. If you have applications, such as DocuMerge, that rely on composition tools, such as CompuSet or DCF, make note of it on your inventory. Also watch for products like EZ Letter, an older product in Group 1’s family of products that include DOC1. It’s based on taggging and for mail merging applications. If you find any merging application make note of it because there are always multiple steps to the printer.
As plans are made to re-purpose your data there may be issues resulting from which composition tool is used, or you may find that font specifications and other formatting routines are managed and specified in the composition system and not in the final application. This can make your migration easier if you have a central point of font specification for all application documents, or it can make your migration more difficult if you discover that every individual document has its own set of font specifications. Find the internal experts to help you do the inventory.

WYSIWYG tools are generally found on the PC. They may be purpose-built for a specific environment, such as a forms development tool, or they may be general-purpose word processing systems, such as WordPerfect or Microsoft Word. General purpose PC tools often require some additional tool to produce output compatible with host-based applications or to produce output to the AFP and Xerox print environments, so look for critical items like print drivers or third-party utility programs if you know that you use PC-based tools for your development.

If you are using one of a more recent class of composition tools, such as Dialogue from Exstream or Opus from Elixir, they generally provide composition, resource management, and multi-purpose output. Applications using these types of tools should migrate to the web and other devices without difficulty, but always talk to the internal experts and the vendor about how your environment is configured.

... Continued in Wrestling Legacy Data to the Web & Beyond: Practical Solutions for Managers & Technicians from MC2 Books.

 

Back to the top...