on Michael Vorburger's Personal Homepage
Home Personal Projects alpha ware



TextOnly HTML Converter / HTML++


Click here for more information about this project

July/August 1998

This is a alpha version which has serious limitations. If there is anybody out there interested in continuing this work, please let me know and I can send you complete sources and ToDo lists... ;-)


The goal of this project was to implement a Text-Only converter for Web pages. The post-processor takes as input an HTML document with possibly rich formatting, using tables for layout and small IMG for bullets etc. It converts this to an output document in proper HTML format (not ASCII text) which contains only text. This makes them more accessible and loading faster.

Technically, this led to a C++ class library (framework) representing HTML documents and tags as various objects. A lexical analyser and parser builds a document's structure in-memory as a linked C++ object tree. The hierarchical object representation allows a "smarter" text-only conversion than what current text-only scripts, mostly being written in Perl, can achieve. For example, graphical IMG bullets are recognised and output as UL/LI, and IMG rulers are translated into HR.

For more related References, see the ALTifier project. Some ideas expressed here have been reused there.


See a sample output generated by TextOnly from this input page.

TextOnly Paper in MS Word format, 18 pages (99 KB)
TextOnly Paper in PDF format, 18 pages (77 KB)
Download textonly.exe Win32 binary (219 KB)


Go to TOP of page
Page last modified on 02-Dez-98
Copyright 1998-99 homepage@vorburger.ch [E-MAIL]

Site hosted by
ItaWeb, Peruggia (Italy)

KISSfp FrontPage Add-On
KISSfp FrontPage

  URL: http://www.vorburger.ch/projects/textonly/index.html