HTML to text converter
Version 1.1, February 1st, 1997

Introduction

This is a HTML to text converter. It removes all tags from HTML files, substitutes HTML symbols for ASCII symbols, and removes extraneous blank lines.

This converter is freely available on the Internet, and is provided free of charge. It may be freely distributed with all files intact, but I would like to be informed if it is distributed on any packages. Users must NOT be charged for this converter.

It is available for Linux at Sunsite (It is compiled using Red Hat release 3.0.3 (Picasso)).

There are also versions for OS/2 Warp 3.0, DOS, and Windows 3.1. Download the latest version from your favourite FTP site, or check at my home page.


Installation and use

For Linux, ask your System Administrator to put the executable html2txt into the /usr/bin directory.

For OS/2, DOS, and Windows, place the executable HTML2TXT.EXE (or HTWIN.EXE) in any directory, but preferably in your path.

For OS/2, you may like to create a program object for the executable. Then drop any HTML file onto this object, and it will convert it for you. (You might need the emx.dll runtime library.)

For Windows 3.1/95/NT, create an icon (or a shadow) pointing to the executable. You may need to use the icon's arguments to set the file to convert, or using the File Manager, drop the HTML file icon onto the executable's icon.


Usage modes

Usage is simple. Type

html2txt HTML file

Where the first argument, HTML file, is the file to convert. A new file, result.out, contains the resultant text file. Edit it to suit your needs.

Alternatively, type

html2txt -l HTML file

An additonal file, html.out, lists all HTML tags and symbols detected in HTML file, and the lines numbers where they were found. Any errors which are reported with the tags may be listed in this file.

Please keep in mind that all HTML tags begin with a < and end with a >. HTML symbols start with a & and end with a ;. If the HTML source file has a <, >, or & as part of its text, the output file may be missing some text. Run with the -l option and check for any problems.

Licensing

Everybody may freely use and distribute HTML2TXT, as long as all files are intact. I will provide technical support, but I am not responsible for any damage that this software may cause. PLEASE register by informing the author by email or post. A nice postcard would be cool. If you have suggestions for improvement, problems, complaints, or a job (!) for me, I would love to hear from you!

Mr Antonino Iannella
6 Bolingbroke Avenue
DEVON PARK SA 5008
AUSTRALIA

Email antonino@usa.net or nettuno@light.iinet.net.au.

Download latest versions from http://members.tripod.com/~antonino.


Version history

Version 1   - Released, no major bugs.
Version 1.1 - Better handling of invalid and unreal HTML tags.