A Curious Case of Disregarded Robots.txt

The Internet Archive recently announced an apparent change of policy concerning the collection of web sites for their long-term preservation effort:

Before this announcement, it was commonly believed that you could ask the Internet Archive not to make copies of a site by adding a statement to the site’s robots.txt file, which would be honored.

The announcement, posted April 17, 2017, reads in part:

A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to info@archive.org). As we have moved towards broader access it has not caused problems, which we take as a good sign. We are now looking to do this more broadly.

However, as I had already noted on Stack Exchange in March 2017, robots.txt had not been fully honored for at least 10 years:

I just did a quick test, commenting out the ia_archiver Disallow entry for a site that had it for at least the past 10 years. Then I looked the site up on archive.org/web, and it showed up grabs it had collected in 2007, 2008, 2009, 2011, 2012, 2013, 2014, 2015, 2016 and 2017! This means that Archive.org never strictly honored what others thought to be a “do not archive” statement during these years, it was merely not exposing the archived copies.

The web site for which I sacrificed the robots.txt continuity was 0xAA.org (an event we used to organize), and here is a screenshot taken during the test:

Archive.org's overview of 0xAA.org saves on March 29, 2017

The original robots.txt file was reinstated after the test, again leading to the familiar “Page cannot be displayed due to robots.txt.” message. The overview also shows captures from before 2007, and the copies saved by the Internet Archive also show the exclusion rule to be present already in 2003. However I did not have the backups at hand to double-check the change history of that older period.

When I originally found this out, I thought that this was interesting to know, but not necessarily newsworthy. However, the blog post that followed seemed to express an official change in direction that gave me second thoughts. While robots.txt is not an official, compulsory declaration of permission of non-permission, neither is the email takedown request mechanism which is proposed as an alternative. I believe in the importance of asking before doing, and it felt like robots.txt was a good element to maintain a balance of acceptance. If the Internet Archive was in a legal gray area with its saving and making available of copyrighted works, could things get worse once it stops respecting robots.txt?

 

I am a big admirer of the Internet Archive. I can’t imagine the loss of the early web of the 1990s, which they so preciously worked to preserve. I visited them several times, I have friends working there, and I would entrust them with some of our own works, if something happened to us.

Having said that, does the public not deserve full disclosure and transparency, rather than what may be seen as a careful exercise in obfuscation?  “A few months” (limited to “U.S. government and military web sites”) and “10 years” (on all sites?) are not the same thing. We are reassured that this new data harvesting “has not caused problems”. But what about the “right to be forgotten“?

Individual users can already now collect screen grabs (like I did), or save pages, or print them. But we have learned that our traditional rules don’t always scale to what becomes possible in a massively automated new world order.

These web grabs, like truth itself, may be helpful, or haunting. What if a version containing a tragic error were to be preserved against the will of the publisher? What about our juvenile mistakes? How long until somebody requests these “few months” (10+ years) for court use with a simple subpoena?

I do not oppose preserving the public web for posterity, even against the will of the original content publishers. I cited some more difficult test cases before, in what I find a fascinating voyage between “free will” and “encrypted mind”. However, I am concerned about making that material available earlier, in a way that goes against the free choice of individuals who may have something to say about that content, and about this not being disclosed with the transparency and debate it probably deserves.

Some thoughts on how this could be improved:

  • Any organization like the Internet Archive should enjoy the privileges and responsibilities that come with Library status, including special powers to archive works while they are still protected by copyright, as well being protected under laws which would otherwise prohibit the circumvention of access-control measures. This could include not just web content, but also software (including copy-protected games) and other digital content. I am aware of precedents, e.g. National Libraries in some states of the former Yugoslavia, where it became necessary for each to individually preserve the works of a fragmenting country.
  • Robots.txt and/or the <meta> tag could be extended to separately express consent to long-term preservation and consent to dissemination of cached versions during the copyright term (or another shorter period, which could be specified). Adhering to this might not be a universal requirement, but at least the original intention could be taken into account later.

Safer Windows

Introduction

Since my early days on 8-bit and Amiga systems, I had the privilege of watching friends, family and customers interact with their devices and with the software we were creating. I always learn something useful from these shared expectations and frustrations, and I do sometimes feel guilty for the state of our industry. As calls for help are increasingly shifting from “crapware” to “ransomware” [1], here is my list of tips to make using Windows a safer and better experience for everyday users, both at home and at work.

While a few of the references should be solid enough even for some of the more technically minded, this is aims to be a page I can refer friends and family to.

What Windows Version?

The other day, my dentist showed me his 3D imaging system, asking whether it was OK to still use Windows XP. Sure I said, it might work virtually forever, as long as you keep it isolated (no internet, no untrusted software or USB keys, etc.)

If you have to use an older version of Windows like XP, my advice is to invest some time to put it into a virtual machine (Hyper-V or similar), which allows for long-term preservation and offers enhanced isolation and the possibility to more easily revert to a previous state. If your primary PC still uses Windows XP (or Windows Vista), and it is connected to the internet, then stop now and change that, because no third-party firewall or antivirus application can guarantee you a safe and smooth experience.

If you are using Windows 7-10 you are in much better hands. Since there are supported ways to do a free upgrade to Windows 10 even after the official July 2016 expiration (e.g. via the Assistive Technologies Offer), consider doing it sooner rather than later. Were it not for the forced reboots with loss of unsaved data when you leave the desk to grab a quick coffee, the disappearance of the Recent Items entry in the Start menu and the inconsistent style of its many user interface layers, I would say that Windows 10 is better than Windows 7 in all respects. It still is the best Windows version ever, and it will be supported for a long time.

If you are going to install from scratch, use a 64-bit version of Windows (which is a bit more robust also in terms of security) rather than a 32-bit one (which can’t be upgraded to 64-bit later).

Uninstall Flash and Java

Flash, together with Java, consistently tops the lists of security vulnerabilities that are exploited by malware as we browse the web or otherwise handle untrusted content.

The biggest loss you may experience by uninstalling Flash is that some older sites may not show some video content. Also thanks to Apple’s iOS being a precursor to not supporting Flash, well-designed sites have been offering the same content using HTML5 technologies for years.

If you absolutely need to use Java (e.g. I am aware of some ancient e-government apps), either you know what you are doing (in which case you will keep it always updated), or you should do that from the safety of a virtual machine.

So if you want your system to become lighter and safer with only a few clicks, go to the Control Panel, look for Flash or Java in the list of installed Programs, and uninstall them. If there are multiple entries, roll back from the newer ones to the older ones.

(Note that Java and JavaScript are two different things: I am recommending that you uninstall Java, not that you disable JavaScript in the browser settings.)

Update, Update, Update

It is “Patch Tuesday” as I am writing this, and like every second Tuesday of the month Microsoft and others released their latest wave of security updates.

If you are not sure whether your system is up to date, start Windows Update from the Control Panel (or Update & security in the newer Windows 10 Settings), and run a manual check. Install all updates and service packs it finds, except for optional addons you may not need, like Windows Live Essentials and Microsoft Silverlight.

You will also want to approve any updates for third-party essentials like Adobe Acrobat. Browsers like Firefox or Chrome update themselves automatically nowadays, but you should allow them to restart if prompted.

The reason why some of my non-technical friends don’t install Flash or Java updates is that they are afraid of doing something wrong. It’s OK to be careful, but there is a short list of common applications (and malware targets), which should be updated no matter what. These include Acrobat, Flash and Java (unless you do like I did, and uninstalled both Flash and Java a long time ago).

Unfortunately you get what you pay for with some free apps, so be careful as some update installers prompt you for more choices until they get what they want. Remember when you opted out of that “Free Search Bar!” that was offered when you installed your favorite PDF utility? Well, you should carefully keep that choice in mind with each update, because the installer may conveniently forget about it.

An example of such distrust-inspiring behavior comes from Microsoft’s own Skype, as its updater tries to reset the browser homepage and default search engine with each and every installation.

Skype Update preselects Bing, MSNSure, you can try and ask once, but is it OK to “forget” about the previous choice with each update? I think that this reduces consumer confidence in updates (remember why some people are afraid of going ahead with updates?), as well as setting a bad precedent for third parties. Respect of all previous setup choices should be part of Windows logo and marketplace requirements.

Malware Basics

Don’t expect any more Microsoft bashing from me now, because I actually believe that Microsoft’s Windows Defender (formerly Security Essentials) is a good antimalware (antivirus, antispyware) application. Being free it’s one of those rare cases where paying more doesn’t necessarily give you more. You also don’t have to worry about subscription renewals.

I recommend it not only because it plays nicely with the system without widening the attack surface, as others do [2], but also because the health of the Windows ecosystem is by definition a top priority for Microsoft. There is no conflict of interest, whereas it could be argued that some third parties benefit from sales of antivirus subscriptions, and there is a temptation of amplifying lists of items “found” and other fearsome details.

An expired antivirus application is not much better than no defense at all, so it should be uninstalled. Similarly, running two or more antivirus applications at the same time may create issues without achieving the desired protection. As antimalware applications share limitations in heuristic and other mechanisms, ultimately an informed and defensive user mindset (like following some of these tips) is what helps raise the security bar to the highest levels.

While I would prefer to use no antivirus at all, I too may occasionally run an additional one-time scan by booting with an independent tool, based on specific needs, for example if a system has already been compromised. Since originally writing this, Microsoft added a new Windows Defender Offline feature to Windows 10. It can be found under Settings/Update & security/Windows Defender. That too is a good tool to use from time to time.

Once, a friend was insisting that a specific “tool” was the only one that would find his constantly reoccurring “malware”, until we agreed that he had simply found exactly what he had been looking for on the internet. So, be careful with what you search for, as a download from an untrusted source may make matters worse, instead of delivering a solution.

Lastly, beware of “registry cleaners”, “download accelerators” and other utilities that promise wonders at no cost. These are the modern equivalent of snake oil, and should be avoided. If it was either necessary or easy to “clean” the registry, it would have been done by Windows. The best way to make the system faster and lighter without disrupting functionality or privacy is to not install certain types of software. I would also avoid any app that tries to install browser addons and other advertising-related tools.

Resist the Dumbing Down

If you followed the tips up to this point, you have a good version of Windows and you already significantly reduced the attack surface used by the most frequent exploits.

Are you willing to go the extra mile and learn some new things about files and Windows access control mechanisms? Then you can decide for yourself how to use that information, and whether you like the fact that as a user you are increasingly being shielded from these details, limiting your potential to learn and improve, without getting any extra protection in return.

Some versions ago it was decided that Windows should hide file name suffixes (endings like .txt, .pdf, .png, .mp3, .exe, etc.) In this simplified world model, users would no longer need to know that file suffixes even existed, or that they could be associated to a default file opening action, nor could even the most informed people appreciate the risk of something named “Virus.txt.exe” (which indicates an executable, not a text file), as that danger-revealing “.exe” part would be hidden.

You can undo this damage by going to the Folder Options and unchecking View/Hide extensions for known file types. Similarly, in the View tab of Windows 10’s File Explorer toolbar, you can check File name extensions. This won’t make anyone a computer expert in no time, but it encourages transparency and education, rather than preventing it.

Already back in the 1990s, the Amiga and Mac operating systems both had mechanisms whereby file types could be recognized by their content in addition (or in alternative) to their suffix. Even in modern Windows, there are XML-based file formats such as .docx and .rp9 where information inside the file provides further hints at what applications should best process the data. So, there is not an absolute rule that file suffixes universally describe the content and the opening application. However, they have been effective at this for decades, even on systems where this is not a requirement. Even if that wasn’t the case, there is no reason to hide parts of a file name, especially as Windows has two additional levels of protection from accidental suffix renames (one is the default selection that is set to the file name minus the suffix, and the other is a warning dialog if you change the suffix).

Don’t Run that File

Now you should be able to visually recognize PDF files from their .pdf suffix, potentially dangerous executables from the .exe ending, etc., but what about less common suffixes, like .spf? Any suffix that you are unfamiliar with is something you would have to research online.

Or, you may ask, why isn’t there some universal access control mechanism whereby Windows doesn’t even run something that you (or an administrator) haven’t deemed safe to install? There is, and it comes in the form of “Software Restriction Policies” and “Parental Controls”. These features allow the system to be configured so that it will only run executables installed by an administrator inside directories that were meant to run executable code (“Program Files”, etc.), where no code can be added by non-administrators. The rules can further be refined so as to only allow the execution of digitally signed code. Taken together, these settings raise the bar in a way that leaves little margin for situations where unauthorized code is run by mistake by everyday users.

For some reason, these great features are not enabled by default in Windows. However, there is no excuse for not having them enabled at least in a corporate environment. Should your IT staff ever ask “Why did you run that attachment?”, you might as well ask them “Why did you not enable Software Restriction Policies?”

UAC Is Good

The latest versions of Windows are especially effective at shielding users [3]. However, this protection only works as long as you don’t run code with administrative privileges.

This means that you should not ignore (or disable) those Windows “UAC” (User Account Control) prompts, i.e. those requests that darken the area all around the dialog. When they pop up while you are updating a trusted application, it is generally safe to approve the request. Otherwise, you should think at least twice, because you are about to give special permissions to some potentially untrusted code. This may even occur unintentionally if you are working with administrator user privileges, and you disabled the UAC system option because you found it to be “annoying”.

Most malware is able to penetrate an otherwise well-maintained system because of a brief lapse of this rule. Once you allow some unknown code (e.g. something “found” on a USB stick) to run with administrative privileges, you can’t be sure that your system is yours anymore.

Ransomware

“Ransomware”, i.e. malware that encrypts your data and then asks for money to decrypt it, is actually the reason that made me want to share these notes. I hear it happen all around me. It is one of those things that makes me feel bad for the state of this industry, and also for the non-technical people who get bashed by their IT department, who often could instead have prevented it.

In a majority of cases, we are all able to recognize something as inappropriate, but sometimes an email with the right subject manages to slip through at exactly the right time, maybe when we were expecting an account statement, or a courier, and that’s when disaster may strike.

If I had reason to believe that my system had been compromised by some “ransomware”, I would unplug the network cable and hibernate or shut down the system, before more damage occurs. First though, if there was a specific message demanding payment, I would take a photograph of it (you can’t completely rely on a screenshot, as that too could be lost). Hibernation may have some advantages, in that it could help preserve details stored in memory that could later aid in a decryption.

Then, consider paying. Or at least, don’t delete the payment instructions, nor the encrypted files. You might change your mind later, or a free recovery tool might become available [4]. Even law enforcement agencies like the FBI advised to consider payment, if the data is important. After all, a bitcoin isn’t that steep a price to pay. In any case, should you decide to restore the system on your own, you should make a copy of the encrypted data, for possible future recovery.

Online, Offline and Travel

Do you consider yourself or your work important enough to be the possible target of an aimed attack? If so, my advice is to work on two different systems: a “vulnerable” one, connected to the internet, and another more secure system, which should be offline at all times. If it feels almost like wearing a tinfoil hat, I can only agree.

At the same time, you might benefit from two systems anyway:

  • A PC for offline work (distraction-free productivity?), a notebook for online work
  • You can carry the notebook with you when traveling
  • One can be the backup of the other (for productivity applications, not for long-term data storage, which should be covered via backups)

In this scenario, a light notebook could act as both the “internet computer” and the “travel computer”. Since this system would be at higher risk of theft of exposure anyway, it makes sense for it not to be a container of sensitive data. It could be set up to only store a limited period of email history locally, and to access more important data only via a secure connection, which needs to be authorized via a password, smart card or other secure mechanism.

If you travel a lot, a situation that may leave you with mixed feelings even more than a stolen notebook or phone is that at customs you are asked to hand over your devices, providing your fingerprint or other access credentials. This has been happening for years in countries we associate with democracy and freedom (Canada, Israel, United Kingdom, United States, etc.), even on arrivals by train. If you don’t comply, you might be denied entry, or even arrested. I consider myself an “I don’t have anything to hide” person, but I admit that I am not totally comfortable with this scenario. While I see it as a necessary annoyance to contrast some crimes ranging from child pornography to terrorism, I am also concerned with what else may happen with this data, perhaps years into the future.

As our devices increasingly become our mind extensions, will our habits change in the way we cross jurisdictions? Will our future nakedness be one of travels without devices, or one where everyone may read into those devices (and perhaps our minds)? I don’t know the answer, but it makes me think about that tinfoil hat…

Post Scriptum

As of May 2017, a massive wave of ransomware known as “WannaCry” or “WannaCrypt” has been making the rounds. It exploits a vulnerability that had been addressed by Microsoft (KB4012598) in the March 2017 “Patch Tuesday”. Systems that ran Windows Update since then would not have been affected.

While it originally seemed that there were no such updates for Windows XP and Windows Server 2003, which are outside of the normal support timeframe, Microsoft later also made available the KB4012598 updates for these older systems. If you are still running one of these Windows versions, you can download the update here.

Interestingly, the updates that were released to the public in May 2017 had been finalized already on February 11, 2017:

References

1. Ransomware: Past, Present, and Future
Cisco Talos Blog, 2016

2. Joxean Koret
Breaking Antivirus Software
44CON, 2014

3. Vasily Bukasov and Dmitry Schelkunov
Under the hood of modern HIPS-es and Windows access control mechanisms
Defcon Russia 2014 (DCG #7812)

4. Lawrence Abrams
TeslaCrypt shuts down and Releases Master Decryption Key

Optimal Conversion of DSC-F1 PMP to JPEG

If, like myself, you have several thousand pictures taken with a Sony DSC-F1 camera (a 1997 model), you are probably looking for a good solution to preserve these for the future with no loss of information. The DSC-F1 camera stores files using Sony’s proprietary PMP format, which is essentially JPEG with a custom 124-byte header. The header contains information like date taken, picture orientation, shutter and aperture details, etc. Nowadays these mostly camera-specific fields are encoded by embedding EXIF/DCF metadata in the JPEG file.

I would like to convert all my PMP file to JPEG (with EXIF metadata) files, because that’s the format that is currently universally accepted both by operating systems and by album applications. Whatever new format will emerge in the future (e.g. JPEG 2000 with EXIF-equivalent metadata), I am quite confident that a similar conversion will be supported in the future.

I couldn’t find a piece of software able to do all of the following:

  • Conversion of .pmp to .jpeg file(s)
  • Ability to convert selection of files or entire directories
  • Lossless “conversion” of JPEG portion
  • Conversion of all PMP header data to EXIF metadata
  • Option to delete original file(s) after successful conversion
  • Option to perform lossless rotation of JPEG image to reflect orientation indicated in PMP header (resulting JPEG oriented as shown by Sony Digital Still Camera Album Utility)
  • Option to apply original PMP file date to JPEG file
  • Ideally, ability to perform reverse conversion (from JPEG with EXIF to PMP), which would simplify comparisons and integrity checks

Tempest Solutions offers a free tool, Pump, which can perform a lossless conversion, but it does not support writing the PMP header metadata as EXIF. (Special thanks to Chris Klingebiel of Tempest Solutions for sharing information about PMP file format details.)

I already use ACDSee 7 to automatically perform lossless rotations of JPEG images, based on existing EXIF rotation attributes, so the requirement to perform the rotation in the conversion tool is not really important. However, this adds to the importance of properly converting PMP rotation attributes to EXIF rotation information.

The purpose of this page is to request feedback from other DSC-F1 camera users, and especially to see who may be interested in a tool fitting the above description. If there is sufficient interest, it will raise the priority for me to invest in the creation of such a conversion tool, and I will let you know then the tool is ready.

Other sites of interest include

Proposal for Setting Canonical Host via Robots.txt

This is a proposal to indicate a preferred host name (e.g. domain with or without “www”) for search engine robots by adding a “Canonical-host” entry to the robots.txt file.

Valid host values are as per RFC 2396 and RFC 2732, i.e. “hostname | IPv4address | [IPv6address]”.

For example:

User-agent: *
Canonical-host: www.example.com

or

User-agent: *
Canonical-host: example.com

or

User-agent: *
Canonical-host: host2.example.com

or

User-agent: *
Canonical-host: 10.20.30.40

or

User-agent: *
Canonical-host: [FC00:AA10:BB20:CC30:DD40:EE50:FF70:CA40]

Rationale

It has always been a common practice to make a web site accessible both with and without a “www” host name. This remains the way sites are almost always configured by default by an ISP under managed hosting plans. While potentially interesting from a usability standpoint (both www.example.com and its shorter form, example.com, will work when typed in a browser’s address field), this results in several problems as soon as the different URLs pointing to the same host are published on the web in spite of the site’s maintainer preference for one specific form.

Known issues include:

  • Discrepancy between search engine result URLs and URL preference of site maintainers
  • Inconsistencies within search engine results (some pages on a site listed with “www”, others without)
  • Same site with and without “www” is listed as having a different “page rank”
  • Failed matches between search engine results and categorization schemes
  • Difficulty, for a search engine, to accurately determine whether different URLs pointing to the same IP address (e.g. HTTP 1.1 virtual hosts) are actually meant to point to the same web site, or not (after all, the content itself can change between crawls)
  • No clarity about number of actual sites indexed by search engines (are different uncanonized URLs pointing to the same web site counted multiple times?)

Solutions to this are limited in part because:

  • Inconsistent incoming links (e.g. with and without “www”) are not under the control of a site’s maintainer
  • While HTTP redirects could be used to express a preference (e.g. by permanently redirecting accesses to example.com to www.example.com, or vice versa), not all managed hosting providers give the customer access to such configuration options

Resorting to robots.txt to solve this problem comes natural for several reasons:

  • Robots.txt provides a method “for encoding instructions to visiting robots”
  • Robots.txt is popular among robots
  • Robots.txt is always accessible by a site’s maintainer
  • Robots.txt is already site-centered (one robots.txt per site)
  • Martijn Koster’s “A Method for Web Robots Control” RFC allows for extensions to the robots.txt format (“extension = token : *space value [comment] CRLF”)

While this discussion centers on the presence or lack of the “www” host name, which is a very practical and frequent issue, the aim is to propose a flexible solution that can be applied to other situations as well.

Conclusion

In consideration of the above, the proposal is made to define an extension token named “Canonical-host”, allowing the maintainer of a web site to indicate a preferred host name value to be used by robots to access and index the site.

More specifically:

  • Robots should interpret and follow this preference in the same way as they would process a permanent HTTP redirect (status 301)
  • Search engines and web categorization systems (“directories”) should consider the preference as a request to update their host name records, if required

Post Scriptum: Robots.txt vs. rel=”canonical”

In 2009 the major search engines announced support for the rel=”canonical” attribute:

Although from a per-page rather than per-site perspective, the new implementation addresses many of the needs covered by this proposal. At the same time though, it requires adding a tag on each page, and it cannot be applied to scenarios where the content administrator has no control over the HTML headers, e.g. with many CMS systems, or with web services. Not to mention non-HTML content (audio, video, images, etc.)

As of 2012, both Yandex and Google are supporting a “Host” directive that is substantially the same as the one I was proposing under “Canonical-host”.

Of Millennium Bugs and Millennium Myths

“We have uniformly rejected all letters and declined all discussion upon the question of when the present century ends, as it is one of the most absurd that can engage the public attention, and we are astonished to find it has been the subject of so much dispute, since it appears plain. The present century will not terminate till January 1, 1801, unless it can be made out that 99 are 100… It is a silly, childish discussion, and only exposes the want of brains of those who maintain a contrary opinion to that we have stated”

The Times (London, Morning Edition, December 26, 1799)

It has been just so in all my inventions. The first step is an intuition – and comes with a burst, then difficulties arise. This thing gives out and then that – “bugs” – as such little faults and difficulties are called – show themselves and months of anxious watching, study and labor are requisite before commercial success – or failure – is certainly reached.

Thomas Alva Edison (Letter to Theodore Puskas, November 18, 1878)

Introduction

As everybody probably knows, the “Y2K” problem is what will happen to some poorly-designed (or very, very old) systems which only use the last two digits of the year to perform their date logic, and which fail when performing certain calculations which involve years after 1999. I don’t have a problem with “The Year 2000 Problem”, and I am not going to add my voice to the countless speculations of how much it is going to cost to fix this, or the never-ending lists of the different things containing one or more silicon chips (even if they do not use any date functions), or the frightening predictions of the catastrophes that will result from our dependence upon computers. Being a bug fixer by profession, I would however like to uncover some of the myths which are increasingly surrounding this issue. Some of them are so frequent and so misleading that they could proudly top the list of this millennium’s urban legends.

Not a Millennium Bug

Since this issue is mostly related to the use of two digits to indicate the year, it occurs every 100 (not 1000) years. This means that the “millennium bug” is actually a… century bug.

Good Excuse for a Party

No doubt that the year 2000 is related to this issue, and that there will be some big parties at midnight, December 31, 1999. But this date marks only the first day of the Y2K problem, not the beginning of a new millennium.

Just to clarify: we are talking about the system used to date years as per “Common Era”. I prefer not to relate this to any particular religion, and those who have interest in these matters probably know that Dionysius Exiguus, based on the literature available today, was wrong by several years when he made his calculations, so there appears to be no particularly well-known religious or historical event that occurred seven days before the date of 0001-01-01 as per ISO 8601 standard. Which illustrates an aspect that appears to be confusing to many: the first year was year 1, not year 0. And the year before that was year “-1” (or year 1 Before the Common Era). All history books which use Common Era dates report events in this way. Introducing a year 0 would require that all dates before 0001-01-01 be changed with respect to how they are documented now. This is unlike how we refer to aging: when a baby is three months old, that’s zero (not one) years and three months. Months, days, centuries and millennia also start counting from 1, and we consider that normal, don’t we?

It should therefore not be difficult to accept that the first millennium of this calendar system  began on January 1st of the year 1, and ended on December 31 of the year 1000. There was no “year 0” just like there was no “century 0”, which is why we refer to years in the range 1901-2000 as being part of the 20th century, and not the 19th century, which has never been the object of any doubts or discussions. Accordingly, the third millennium (and not the second millennium) will start on January 1st, 2001 (not 2000), which is also the first day of the twenty-first century. By this definition, the millennium bug is neither a millennium bug nor a century bug, as it is not related to the beginning of any century or millennium, unless we want to consider the end of any arbitrary range of 100 or 1000 years a good excuse to party.

Actually, I am told that 100 years ago they had two big parties: one at the end of 1899, and one at the end of 1900! In both cases people celebrated the beginning of the… 20th century.

Bug of the Millenium

“Millenium” written with one “n” is probably one of the spelling bugs of the millennium. It is neither English nor Latin. With an accent, it could be French. Yet if you do a site search for “millenium” on sites like microsoft.com the hundreds of pages your favorite search engine will find are in… English.

Nothing New

Just like the origin of the term “bug” to define a technical problem is not related to some moths which indeed were found inside the Mark II computer in 1945 (both “bug” and “debug” were in use long before that, as several dictionaries and writings confirm), the “millennium bug” has little to do with computers and programmers.

If I look around me, I can see dozens of non-computing examples of this issue, but I cannot remember having ever read about how much it costs, each decade, century or millennium, to fix them. As I write this, I have on my desk a booklet of blank forms, first copy white, second copy pink, from my grandfather’s company, in which the date field was introduced by a pre-printed “195_”. In 1960 the forms had to be converted to notepads, and are still in use after 50 years. A few days ago I read on a local newspaper about a funeral parlor that still has hundreds of gravestones with “19__” pre-sculpted. My grandfather did it with his forms, and IBM did the same with its inventory of punched cards after magnetic media became the preferred medium, but I wonder what will happen next year to these stones… Will they put them in their office, next to the telephone, to quickly engrave some notes?

Looking back at old writings, even of several centuries ago, one will notice that years are often written as two digits. Using two digits instead of three, four, five, or whatever is necessary to fully express a date, does not only save bytes in computer storage (by the way: using two digits instead of four saves one byte, not two, since a byte can store two decimal digits) or two precious columns on a punched card, but is more convenient to write. The use of two digits instead of four has always been common even in printed material, where the advantages are considerably more limited, and measured in very small amounts of ink and paper, since the work is done by a machine. This would seem to be a strong confirmation of the fact that the issue is much more about general culture and habits, rather than specifically a computing choice.

And if writing were not enough, even our oral culture reflects this tradition: we say “sixties”instead of “nineteen-sixties”, and in many languages when recollecting events and personal experiences only two-digit years are used (in some languages the “nineteen…” or other century prefix is important to recognize a date context). Two-digit numbers are also used when talking about car models and software products up to the year 1999. Wherever people talk, whether in the streets or on television, this is the language that is most frequently used.

Conclusion

It is my opinion that, in an ideal world in which everybody consistently used four digits to write the year, and except for a few extreme and now extinct cases like those obsolete punched cards mentioned before, which could store only 80 characters each, programmers would never have been a significant exception, and we would now not be talking about a “millennium bug”. Even in our imperfect world, good teachers told their students to watch out for this problem several decades ago, and good engineers and programmers planned ahead and avoided the problem altogether without anybody having to tell them so.

In theory, if we learned from this experience, date-dependant code would from now on have to be written so as to allow for the representation of infinite time. Well-written specifications should take into consideration that four digits is the minimum to be used for dates, and not a constant. After all, we’ve already used up 20% of the range of years which can be defined using four digits. Anyway, I am sure that within the next few decades computers will become smart enough to work around these human limitations, and fix our bugs.

To be honest, there are a few places in which I wouldn’t want to be if a problem really emerged at midnight, December 31, 1999. A plane and an intensive care unit come to mind. But hearing of people stockpiling food, gasoline and money makes me wonder more about our insecurities in Y2K society than about possible software problems. About the “bug” itself, I do remain an optimist, and I am sure that when confronted with the actual problem this issue will be solved surprisingly quickly. In January, 2000, for all the organizations involved, the “millennium bug” will ultimately become a “fix or die” issue.

As they say, time solves everything…

 

P.S.: Speaking of calendars, wouldn’t you love a year consisting of 13 perfect 4-week months (13×28=364, plus the usual leap day every four years)? The idea is not new, but it somehow got lost a few millennia ago.

 

© 1998, 1999 Mike Battilana

Thoughts on Unsolicited Email Advertising

Last revision April 6, 2003. Original text published January 24, 1998. © 1998-2003 Mike Battilana. SPAM is either a registered trademark or a trademark of Hormel Foods Corporation in the United States and/or other countries.

Overview

This document contains the following main parts:

Introduction

I originally wrote this text after my outgoing emails were not being delivered any more as a consequence of an episode of unsolicited “bulk mail” advertising. You can still find the full story below. While most of this content is country-neutral, some episodes are linked to my work experience with US and Italian companies.

Over the years, mostly thanks to this page, I came in contact with a variety of other users (both frustrated by some of the unsolicited email they receive, and also those who prefer to consider this “free speech”) as well as organizations trying to deal with unsolicited email advertising in one way or another, and even a few people who said they found this text useful in studying possible regulations for their respective countries. The phenomenon itself evolved, and so did these notes.

If you are not familiar with the type of email I am talking about, have a look at what people like myself are getting every day as of early August 2001:

Inbox

Each frame of the above animation shows a set of titles, dates, and the first few lines of the messages which every day I have to receive and sort. Somebody else could be doing this, but it wouldn’t be email any more: real time, fast, private, or working day and night if necessary. I removed a few duplicates and some numbers and character codes which sometimes appeared in the message title for what I believe where tracking purposes. I obviously also receive “normal” email, which is not shown here.

Needless to say, I don’t know the individuals or organizations sending out these emails to millions of internet users like myself. Even if some of these mails appear to imply that the recipient subscribed or otherwise expressed interest in the topic (e.g. “Below is the result of your feedback form” or “This E-mail is never sent unsolicited”), that does not apply to me, neither for the “wonder drugs”, nor for the religious texts, nor for the money-related and all other “services” offered in these emails. If you think that the above collection is a wasteful celebration of stupidity more than an expression of free speech, be reassured that it’s all real. I did not pick the funniest messages, or the ones with more spelling errors, or the ones trying to target the people who are most desperate and in need of help.

I have talked to some legislators who were not at all familiar with the issue, either because they did not use email, or because they did not access it themselves, or because they never had their address appearing on a web site or newsgroup discussion. This has been one reason for me to add this updated introduction.

I know, I know, I shouldn’t put my email address in public places, is that what you are thinking? Or maybe I should just give up, and use several different addresses, perhaps making people sign contracts to not insert the “good ones” in files where a human could understand them, but a good address extraction robot couldn’t? Well… I hereby would like to affirm the right (the freedom, if you prefer) and pleasure to be reachable by interesting people like yourself, and not to have to hide from email robots programmed to harvest addresses and then bomb me with “AS SEEN ON NATIONAL TV!”, “YOUR A WINNER!!!”, “Claim Your Complementary Digital Camera” and “Do Not Repay Your Credit Card Debt” messages. Also, I like the idea of being able to use just one email address, and to allow real people, not advertisers, to be able to use it for a very, very long time. Email addresses are one of the few things in technology which have the nice potential to offer some long-term stability. No, I am not actually going around posting my address on web sites, but, just to mention an example, my email address happens to be included in documents which in part have to be made available online. Also, I would like friends to be able to find me, if they lost or never had my address, and I would like to be able to use internet newsgroups the way they were meant to be used, not in a constant challenge in which addresses have to be artificially modified with “nospam” prefixes, suffixes and other formulas which sometimes robots can sort out, but users who are not technically familiar with email can’t.

If this is still new to you, try and imagine yourself receiving these numerous emails, sent to your address, calling you “DEAR FRIEND AND FUTURE MILLIONAIR”, and having to process them one by one, risking to miss something important in this distracting and, frankly, often irritating process. Now, imagine yourself receiving these emails on your GPRS or G3 phone, which is nice and small, and gives you constant access to your email inbox, but in which you still pay for every byte, both directly and to support the infrastructure.

Recent technology trends have not only made it possible to read email in real time on inexpensive phones with “always on” internet connections, or on satellite phones with expensive connection fees, but they also helped the senders of unsolicited email in more than one way. For example, thanks to new and increasingly popular software every personal computer can easily be set up to be its own SMTP server (i.e. to send emails without using your ISP’s SMTP server), thereby making it relatively useless to block the address of a specific SMTP server, since the same IP address could, the following day, be used by another user who needs to send “legitimate” email using some similar SMTP server software installed on a notebook (having your own SMTP software on a portable computer is quite useful, as you don’t have to reconfigure your mail programs for every different internet connection you use as you travel).

While sending mass emails is getting technically easier yet more powerful, encouraging “hit and run” behavior and enhancing aspects which existing laws did not completely cover, one of the most prominent businesses promoting themselves via unsolicited commercial email now consists of… the business itself. Preconfigured CDs containing millions of email addresses and SMTP server and other software to be used to prepare mass mailings now cost less than $100, and can be used on any PC. I must confess that reading some of these emails, and considering the similarity in style with other emails which tend to play with emotions, illusions, hopes and “immense potential” more than with solid facts, I was wondering whether, after all, advertising via unsolicited email does produce any positive results at all, other than for those offering these CDs and mailing services. It certainly looks to me like there is a new emerging trend in which the phenomenon is increasingly self-promoting itself.

Emerging technologies, such as Enum (Telephone Numbering Mapping), which can be used to map telephone numbers to email addresses and other information and systems on the internet, also have the potential for new types of exploitation and abuse. Just imagine robots which query Enum servers to harvest valid email addresses based on random phone numbers, and then use this data to automatically send unsolicited email.

Unsolicited commercial email is exploiting that fascinating part of technology in which the cost to reproduce something in unlimited quantities tends to zero, and in which everything which is technically possible and legally accepted happens not only in theory, but also in practice. However this only applies to the senders, whereas on the recipient side different rules apply, and the amount of energy, time, money, patience and legal resources tends to be in proportion to every single message or sender. So what starts with one “Congratulations!!! You have been selected as a finalist in the Getaway Travel Sweepstakes!” email per week can easily evolve to a point where one thousand or one billion different companies or individuals start sending similar emails, every day, to users like you and me.  The more messages you will receive the more it will cost you to do something about it. And, of course, these messages will all say “This is a one time mailing – no need to unsubscribe”.

It All Began with “Junk Mail”…

It all started in 1995, when my private CompuServe account began to receive some unsolicited advertisements. These first, few, emails often included lists of other recipients’ email addresses (which all other receivers could read), which were tens of times longer than the message itself. In most cases, no effort was made to hide the sender’s email and server data. A few of these advertisements offered for sale lists of email addresses. This is probably how the chain reaction was ignited. Technical considerations apart, I was quite annoyed: access to my CompuServe account was through an expensive toll call (there was no local CompuServe access number), and in at least one case it happened that these dozens of unsolicited mails filled the hard disk partition where my mail was stored, so that I could not receive the messages I really wanted to receive. I contacted CompuServe security, but, apparently, there was nothing they could do, or wanted to do. I can’t avoid considering that I was paying money for each minute online, even while I was dealing with undesired mail. For some of the organizations involved, this meant additional profits. The more incoming junk mail there was in the mailbox, the higher telephone and online connection fees I and other users like myself had to pay.

At that time (between the last months of 1995 and the first months of 1996), most of these unsolicited emails originated from a few Internet domains, which were used for different types of messages, and most of these offered the possibility to be removed from their “service”. Not that I felt right about users having to waste time doing this each time they log on, but I asked to be removed from all of these lists. I did this more or less in a single day. The effect was in part unexpected: the domains from which, until then, most of the junk mail was flowing in, suddenly did not appear any more in the email headers. However, I was receiving about the same amount of junk mail, only coming from apparently random domains. What a coincidence, I thought…

By mid-1996, it sometimes happened that I had to download 100 Kbytes or more of junk mail at a time from my CompuServe account. Sometimes junk mail caused the account to overflow, bouncing back “legitimate” messages, as well as more junk mail. This is because, like many service providers, CompuServe has a storage limit on incoming mail, and at the time that was 100 messages. Everything in excess of that would bounce back to the sender, until I downloaded my mail, making room for more. Needless to say, more than 90% of messages were junk mail. Like many others, I lost potentially important professional opportunities because mail from people I had given my address to did not get through.

Obviously, I stopped using the CompuServe email account. By the time, most Internet Service Providers (ISPs) and legislators were unsure about what action to take, if any. It seems that, as I am writing this, the situation is more or less unchanged, only that more junk mail than ever is floating around, contributing to the overload of the internet, and to user frustration.

How can this be? When I access my email using a cellphone, each piece of junk mail costs me a lot of money and time. Cellphone or not, junk mail can, as I described above, prevent me from receiving important email, for example by filling up my hard disk or my email account. Users with a single phone line have to keep their phone line unnecessarily busy to download junk mail they do not want. In the meantime, their (voice) phone line is busy, and others cannot reach them for matters that may be very important. When I access mail on my ISP’s account, I have to pay telephone connect time and monthly subscription, as well as ISP connect time and monthly subscription. I think people have a right to choose what to pay their telephone and ISP bills for. I do not pay any of these to receive postal advertisements, and postal advertisements do not prevent me from receiving other mail! How can some people selling software designed to send junk mail claim that unsolicited email cannot be compared to fax advertising (which is illegal in most countries, because it keeps the recipient line busy, uses the recipient’s paper, etc.)? In my opinion, unsolicited email advertising is even worse than unsolicited fax advertising, because with email the recipient pays most of the costs, whereas with fax (and postal) advertisements it is the sender who has to pay for most of the delivery. The recipient’s time and frustration, busy telephone lines, telephone fees, ISP fees, disk space, lost email, does all of this have no value?

I forgot to mention this: my CompuServe account is a German one, and I collect my email from different countries, which have different laws about unsolicited email advertising. In Germany, this practice is prohibited, more or less like unsolicited fax advertising. Yet, this account keeps receiving junk mail from the US every day. Don’t the senders realize that they may be in violation of international laws? Don’t these organizations know that CompuServe addresses beginning with “1” are assigned to non-US residents? Don’t the senders check the US InterNIC database to see whether a domain (.com, .net, .org etc.) is registered to a US organization or not? Not that I know, according to the email I see floating around. But even that wouldn’t be enough, because, even in the case of a domain registered to an organization residing in a country in which “spamming” is not against the law, the individual recipient of the email may reside in a territory in which the same is illegal. In theory, these doubts could be enough to stop more than 99% of today’s junk mail, if it originated from responsible and careful senders. Instead, they go on, hiding behind fake addresses and headers (as if this alone wasn’t a sufficient sign that there is something fundamentally wrong with this).

As long as there is even only jurisdiction in the world in which “spamming” is illegal, “spammers” should in my opinion actively check that their unsolicited mail is not being delivered or read in that jurisdiction. Of course, this is neither practical nor possible, especially considering mobile users. Which would lead to the conclusion that “spamming” is potentially illegal in all cases, and senders should accept the related possible consequences. As far as the collection, storage, distribution, sale and use of lists of email addresses and other personal data from newsgroups and web pages is concerned, it should also be noted that in many countries this is subject to separate data protection and privacy regulations.

Theoretically Traceable, Practically Anonymous

SMTP (Simple Mail Transport Protocol) is the internet protocol used by mail servers (SMTP servers) to process email requests. Senders of unsolicited commercial email very frequently rely on the unauthorized use of third-party SMTP servers. This can be of advantage for several possible reasons:

  • the use of a third-party SMTP server introduces a theoretically thin, but practically quite effective layer of perceived anonymity;
  • letting somebody else’s SMTP server do the work reduces the transmission time, the bandwidth requirements and costs, and the overhead of having to deal with transmission and address errors;
  • the use of different SMTP servers makes it more difficult for “anti-spam” systems to detect and block the flood of mails based on the SMTP server address;
  • the more SMTP servers are used, the more the abuse is fragmented into smaller chunks, and the less likely it is that each victim of such unauthorized use takes action (“you won’t sue me for just having sent a few kilobytes through your server, will you?”);
  • using somebody else’s SMTP server can be a way to bypass an ISPs requirement that clients not engage in “spamming” through their own SMTP servers.

A computer can send a single request to a SMTP server consisting of an email message body and hundreds of addresses, and the SMTP server will then in turn do most of the work and send those hundreds of emails as requested. In the original SMTP implementations it was not considered to be necessary to require any type of explicit authentication of the requesting system. Even without a username and password, SMTP operations are however not anonymous, because SMTP is built on top of TCP, which consists of data packets contained in lower level IP (internet protocol) packets. Every IP packet sent from A to B has to contain both the “internet address” of A and B. A mail server can be configured to only accept requests from addresses residing on the local network, or otherwise associated to specific systems. The sender address contained in an IP packet can be forged (one just has to send a packet with a fictitious A address), but then B would not be able to send data back to A. In a procedure known as “IP spoofing” a malicious sender A sends an IP packet to B providing an incorrect A reply address to B. But an IP packet alone is not very useful. TCP/IP, which is the combination of TCP packets inside IP packets, and which is used for SMTP requests, additionally makes use of TCP “sequence numbers”. For successful “TCP spoofing”, a malicious sender A has to not only forge its address both at the IP level and in the TCP packet, but it also has to predict the correct sequence number that B will tell it to use in the first reply. TCP sequence number prediction attacks are neither easy to implement nor guaranteed to succeed, but they are technically possible. Additionally, SMTP introduces an additional level of interaction between A and B, so that a malicious sender A would have to correctly predict the answers of the SMTP server, and respond accordingly. Unless errors or other unusual circumstances occur, it is however generally possible to estimate what a SMTP server will answer based on the requests of A.

If the SMTP server is configured to only accept requests from a known local address range, then a simple firewall system can be put in place to filter all packets sent from the internet  to the SMTP server which have a “spoofed” address, i.e. if the packet comes from “outside”, and it has a forged “inside” sender address, the packet is blocked, and alarm bells ring. However, especially in large and complex networks, this may also unnecessarily restrict legitimate use of the mail server. This can in turn be solved by additional protection systems, the complexity of which is however the main reason why many SMTP servers are “unprotected” and accept requests from any client. This however does not mean that requests are anonymous: the IP address (real or fictitious) is logged as part of each SMTP transaction, and becomes part of the email header.

In brief, in order to remain anonymous and also have its SMTP requests be accepted, a malicious sender A would have to:

  • Forge its sender address A, making sure it is an address accepted by B to process SMTP requests
  • Correctly predict the TCP sequence number that will be requested by B (the request is lost, as the A address is forged)
  • Further interact with B making assumptions about the contents and timing of requests from B to A (again, all messages from B to A are lost)
  • Make sure that no real A is online and replies, or else attack such an A system while communicating with B, so that the real A has no time to report an error to B
  • Hope that no firewall is installed between B and the internet which blocks internet packets which carry fake sender addresses appearing to originate from the local network, if that is what A did
  • Hope that no intermediate routers or other systems keep a record of its activity

In practice, already making one connection in which all of the above conditions are met is very, very difficult, and time consuming. Sending millions of email messages in this manner would be both highly impractical and beyond the technical skills of the average senders of unsolicited commercial email. This means that in practice:

  • The identity of sender A is not forged
  • Sender A can only use third-party SMTP servers which accept requests from unknown systems (no password protection or other address-based restriction)

Unfortunately, the damage/cost ratio in cases of SMTP abuse, which is covered by existing computer crime laws much more than “spam” itself, works to the advantage of the “spammer”. Even if the yearly total worldwide damage is high, no single party usually sustains a cost high enough to motivate an expensive technical and legal work. Furthermore, even at the following level, i.e. at the investigative and judiciary phase, the same logic also applies (i.e. low damage, high cost, case archived). So, it seems that SMTP is not anonymous enough to not be able to trace a sender, but it is anonymous enough to work well for “spam”.

SMTP now does support authentication (e.g. RFC 2476, RFC 2487, RFC 2554, RFC 2645, RFC 3207), i.e. it is in theory much easier to trace back a message to the real ISP and sender. The Authenticated Mail Transfer Protocol (AMTP) specification further builds on this, aiming to replace SMTP with a more secure derivative. Authenticated SMTP, or AMTP, however, is not yet a requirement on the side of ISPs and network operators. Maybe the increasing cost of “spam” will accelerate this requirement.

To Regulate, or not to Regulate?

My opinion is that junk mail should be regulated by law, just like advertising over fax, automatic voice calling systems, pagers, SMS short messages and other electronic media has already been regulated in many countries. The simple underlying logic would be that receiving a piece of email costs money and resources just like receiving a fax. I don’t like the idea in general, but I see no alternative, especially where existing regulations do not cover things like the access of a third-party’s SMTP server, or forged or missing sender identity information.

Whatever regulation is considered, I would encourage it to also carefully harden itself against the emerging trend of “one time” mailings. If you allow these, even with an “opt out” option, you are de facto allowing for unlimited one time mailings to paralyze the system and annoy users as if there was no regulation at all. Similarly, sender email addresses which may be “valid”, but in which the user or domain part are used only once per mailing or even possibly once for each destination address, make the requirement to use a “valid” address irrelevant for the purpose of automatic filtering, which possibility is sometimes mentioned in the same context.

The best solution that comes to my mind is a combination of legal regulation (or extension/interpretation of existing laws, e.g. as applied to fax advertising, computer crimes, theft, privacy, etc.) plus a technical approach (e.g. authenticated SMTP, or AMTP), because obviously the internet as a whole has shown to be vulnerable to the abusive weight of unsolicited advertising.

In spite of what I wrote so far, I also think that junk mail should be allowed at a separate layer, and that there should be a way to flag, transport and deliver unsolicited messages, if the end user so chooses. But this should be implemented in a way that the proper mechanisms for authentication and distribution of costs and resources are applied. It is too late to filter one million messages after they already traveled twice around the planet and were stored by the final ISP pending verification of a user option. The concept of allowing for the optional acceptance of unsolicited mail is important to me, because it means, after all, free speech, whereas blocking “spam” can become a form of censorship. I think that you should, at any time, be able to open the window and see what it is like outside, even if this means receiving a million messages a day.

I can in part understand the fear that a law may have negative side effects. If the regulations forbidding junk faxes can be considered a good precedent to compare with, can somebody perhaps tell me any negative side effects they had? I think, here we are talking about laws which give to the people more freedom than they take away. This is the freedom not to have to answer a phone only to hear a robot playing a tape, the freedom not to have your fax paper or hard disk space consumed by junk faxes and email, the freedom not to have thousands of robots try to access expensive 24×7 human customer service resources. The freedom not to have to waste money and time on something you have decided you don’t want to receive, and the freedom to still receive all of this, if you wish so.

… Then Came “Anti-Spamming”

The most disrupting episode I experienced in relation with “spamming” felt worse than the effects of three years of junk mail, combined, and was what actually led me to write this.

I was using a very large, professional and reliable ISP, the best in the country, in my opinion. One day a CompuServe user sent some junk mail using my ISP’s SMTP server. The sender had probably no contract or other right to use this SMTP server to flood the net with this “Earn $280 – $500+ weekly” message, but this nevertheless occurred. The sender may even have had, in theory, the malicious intention of blocking that ISP’s mail service, rather than disseminating junk mail. Whatever the case, both results were achieved. Within a short time, the IP address of my ISP’s SMTP server was put on a sort of “black list”, which, I discovered, was checked by a large number of organizations and ISPs to determine whose mail to accept or to refuse, probably on the (incorrect) assumption that all mail coming from a mail server whose IP address is in the “black list” is junk mail. Indeed, I was later told, we were honored by none other than the “mother or all ‘black lists'”.

Instantly, thousands of “innocent” users like myself had their email blocked, at least with respect to recipients whose systems were checking this “black list” (I was surprised to see how many did). Hidden in the error messages which kept bouncing back to me from a dozen of sites, I found a hint to the system which was at the origin of our troubles. I use the word “hidden”, because experience tells me that the average user does not look into these “Returned mail: Service unavailable” reports, scanning the technical contents. Anyway, I went to the web site in which I was hoping to find an explanation of the “problem”, but the system, belonging to the maintainers of this “black list”, could give only one answer: “ERROR… The remote site or server may be down”. So I sent an email to the company behind these not so efficient technologies, telling them that their site was down, that their service was blocking my email, and asking what I could do about it. Very efficiently, within seconds, I received a reply: “Access denied due to spam and mail abuse”. It should be noted that I also tried, with the same results, the address that ISPs were supposed to use to try and get help when their own mail server is trapped in the “list”.

In the meantime, I realized how many messages, all sent to different people in different countries, were bouncing, all in relation with this “service”. My work was being interrupted – not by junk mail, but by somebody’s intention to stop junk mail. Somebody had decided that, based on one piece of junk mail, all users of a very large and professional ISP had to have their email blocked.

I called my ISP, and they told me that they that they had already contacted the maintainers of the “black list” three days earlier, and that they were taking the problem very seriously, even considering legal action. Just to be sure, I tracked the fax number of this “black list” company, and explained them the situation again, in my own words.

The following day, my email was still bouncing, but at least the web site with the information about the “black list” was up again. This web site basically explained two things with which I do not agree. First, it contained repeated “reassuring” claims like “we have not singled you out” (no, they just “singled out everybody”, I thought, thinking of episodes in which an entire village was “punished” for something done by an individual), “we mean you no harm” (why do facts often end up so far away from the “good intentions” behind them, and how can somebody who is an active part of a system that punishes innocent people claim to “mean no harm”?), “we will help… usually within minutes” (our ISP has been waiting for four days now, and we are all still counting), “we are not the network’s police force” (from what I saw, it would indeed seem more like police, judge and jury at the same time) and “Loss of connectivity hurts us all. Spam hurts us all even more” (one opinion thousands of users like myself do not share – wait until “loss of connectivity” hits you, and see what feels worse, and whether it is right to fight a problem by creating another problem, even if possibly smaller, and with the best of intentions).

Does the goal justify the means? Can we hurt innocent people in relation to something committed by others? Does this solve the problem? I don’t think so. What I see here, is a confusing and dangerous mixture of personal opinions and objective facts, good intentions and disrupting results. A good example for the need for a clear and official position, with a real police, real judges, and real juries, rather than a multicolored variety of home-made imitations. Until we have this, both the senders of junk mail and the supporters of “black lists” will have their very own reasons to justify their respective positions.

Additionally, this site explained that what ISPs had to do, in their opinion, was to program their SMTP servers so that they would not receive and forward mail from users who were not logged on to that ISP (i.e. they should prevent “third-party mail relay”). But, I wonder, isn’t this a “solution” that, for many, is “proposed” too late, since it comes after people’s accounts have been blocked? Additionally, such a solution, even if applicable, could not, alone, prevent an ISP’s authorized customer to send out one or more pieces of junk mail using that same ISP’s SMTP server. Even if sending junk mail were against the ISP’s policies, there are enough ISPs to allow for “hit and run” tactics, where the sender uses an ISP once to send thousands of messages, and then changes ISP, and so on.

In my original article I had also written:

Also, I and many people like myself frequently use one ISP to log on to the Internet, and another organization’s SMTP server (to which of course we have to be granted access). In my case, this happens because I move around a lot with a portable computer, and I access the internet using different accounts (preferably with a provider who has local access, or using somebody’s LAN), yet I prefer not to reprogram my mail software to use a different SMTP server each time. Other people do it for testing, or for other reasons which may not be as frequent as junk mail, but which should not, in my opinion, be penalized by junk mail. Nobody in the internet user community, I think, should be made to pay any price for this abuse of the email system. The fact that something good is misused by some for illicit purposes should not, in my opinion, result in this possibility being banned for all other uses as well. If this were always the case, it would be a very high price to pay for peace and justice, especially when the “solution” does not solve the “problem”. Last but not least, let’s not forget that using a third-party’s SMTP server is often an ISP’s only hope to get its email out in order to ask for its IP addresses to be removed from somebody’s “black list”. In my case, using a different SMTP server was the only way to escape the effects of the “black list”. What an irony, isn’t it?

I now think that the above “Keep SMTP Free” defense is not applicable, for two reasons:

  • authenticated SMTP and AMTP, which were just beginning to emerge in 1998, now allow people to use any SMTP or AMTP server regardless of their current ISP or network configuration, thereby solving the legitimate needs outlined above;
  • the abuse of un-authenticated SMTP by both “spammers” and virus software has proven to be a technical weakness which is damaging the internet as a whole, so in spite of it having possible legitimate uses (now rendered obsolete by authenticated SMTP and AMTP), I think it should be treated like a security hole, and banned as we do with other defective software.

Other solutions have been proposed to try to stop, or, at least, reduce, the flood of junk mail. For example, many mail clients now have an option to automatically filter out mail coming from certain senders. Very quickly, junk mail senders have “adapted” to this situation, putting random letters and digits in their addresses, so that no piece of mail has the same address, and can be detected on the basis of this. (This use of random parts does not mean that the address itself does not work as a return address, i.e. it can be a “legitimate” and working address even if it has a random part.) Also, even in the best case, this mail is processed only after the user has paid for the cost of receiving all messages, and after the other issues already mentioned, so much of the damage is already done.

Other systems try to detect junk mail at the server level, based on the assumption that if an identical message is received by several recipients at the same time, it is considered “suspicious” (if not automatically deleted). In the best case, this requires manual inspection of the mail by an administrator, in the worst case this is an intrusion in privacy, and causes delay in the delivery of requested or subscribed information (say, 1000 users of an ISP have deliberately subscribed to a newsletter), if not the deletion of some mail which has nothing to do with junk mail. Also, small sites do not have enough users to allow for this type of detection, and detection in big sites can easily be circumvented by sending junk mail messages at random times for each recipient, and with random changes in the contents (changing spacing, punctuation, minor errors, proper nouns, user name in text body, etc.).

Should I even mention this? From time to time I tried to add my email address to serious-sounding lists and web sites that collect addresses of people who do not wish to receive junk mail. Each time, I also added a unique test address that allowed me to see if the data ended up in “spam” lists, just in case. I do not know if, as a consequence of this, I was receiving more junk mail or less junk mail, but I honestly couldn’t tell the difference. However Sally Hambridge and Albert Lunde, authors of RFC 2635 (published more than a year after my tests), reported that “Careful tests have been done with sending remove requests for ‘virgin’ email accounts (that have never been used anywhere else). In over 80% of the cases, this resulted in a deluge of unsolicited email, although usually from other sources than the one the remove was sent to.”

I believe that the fragmented attempts for self-regulation, do-it-yourself “justice”, and other forms of private and market control of unsolicited email advertising failed, in some cases even adding chaos to chaos, and damage to damage. For this reason, I believe that some type of higher-level regulation is necessary, to control the issue as a whole, ideally with a higher priority where this phenomenon is more intense, such as in the US.

Conclusion in Sight?

After five days, the account with my preferred provider was still blacklisted. So I decided to write again to the company maintaining the “black list”. As in the other parts of this text, I’ve decided not to disclose names and other contact data, other than those of “institutions”. I already mentioned CompuServe. The other service provider I mentioned before is Italian Telecom’s Interbusiness, which offers Internet connectivity to Italian ISPs and corporations, using anything from ATM down to ISDN, and is Italy’s most important backbone and connection to the world.

The text of my letter follows.

Dear […],

Following my two letters of January 24, I wanted to update you on the case concerning the inclusion of IP address Y.Y.Y.Y in your “blackhole list”.

I understand that the IP address is still “blocked”, in spite of repeated and numerous attempts on behalf of Interbusiness, as well as by ourselves, and presumably others, to have the address removed from your list. The inclusion of this IP address in the list is currently causing a lot of problems, and I would ask you once again, very kindly, to please remove it, perhaps in consideration of the following explanation, which, in my opinion, should convince you of the anti-spamming policies not only regarding this case, or Interbusiness, but of many civilized nations like Italy and Germany, which have succesfully banned spamming (without blocking any user’s email accounts).

I heard that one of the issues here is that Interbusiness does not have an “abuse(at)interbusiness.it” email address, nor a “policy” against spamming on its web site. About the email address, I spoke with them today, and made sure that they learned about your request, in case they didn’t already know. For your information, however, all Italian .it domains already have to have, by written contract, a fully operational “postmaster” account. This is regularly tested by the Italian NIC (the equivalent of the US InterNIC, which, as far as I am aware, does not enforce such a regular testing). Don’t you think that this means of contacting a domain administrator may suffice, at least for now?

As for the policy, in case you do not know this already, all Italian domain operators have signed an anti-spamming policy with the Italian NIC. Otherwise, they could not have an .it domain. Beyond this, since 1993 in Italy we have a set of laws specifically dealing with computer crimes, which make it a most serious offense to access somebody else’s server, forge email headers, disturb network operations doing things such as mass spamming, creating software and installing systems for doing all of this, etc. Under these laws, the people behind the piece of junk mail which caused you to take this action risk up to five years in prison, or perhaps more, and very high fines. In addition to that, we have a good set of court precedents dating back to the days of fax advertising.

Don’t you think, in consideration of this, that perhaps Italian ISPs should not be required by you to have a “policy” about spamming, since the contracts and the laws we already have are much more than a “policy”? Proof for this is, if there need to be any, that in Italy spamming was virtually eliminated years ago.

I do understand that the last remaining issue here is that Interbusiness’ SMTP server was misused by, probably, a US organization. I was told by Interbusiness – and I would like to share this information with you, in case you did not already know about it – that they are now evaluating different technical possibilities to prevent this from happening in the future. But please understand, they are a very large organization, with hundreds of servers, and such a decision and implementation is not as quick to put in practice as for a one-man ISP. I cannot speak for them, of course, but this is my opinion.

Beyond the technical aspects of the server’s misuse, it must be considered that several Italian laws have been violated. This US organization made a very big mistake in using a server in Italy, one of the countries where accessing and using somebody else’s computers is punishable under some of the toughest and most up-to-date computer crime laws anywhere. This could, maybe, even create a precedent for your cause, regarding US spamming (also see, perhaps, my notes on this issue on [edited]).

I can assure you that legal action is being considered by several parties, and more than one investigation is already in progress. The company named in the advertisement has been contacted, and seems to be very supportive about this case. They claim that a competitor of them has sent this message. Comparing with past advertisements of this company, which have already been listed, it should not be difficult for anybody to determine if the message was meant to discredit it. As far as I know, CompuServe security is now being contacted to see who was logged on and sent the mail to the Interbusiness server. Information about the two email accounts mentioned in the mail is also being collected. If you, with your technical experience, can find more information inside this piece of mail, or have any suggestions to identify the senders, please let us all know.

However, I must also remind you, if necessary, that, by deliberately continuing to directly or indirectly block email of Italian users, with destinations which potentially include all countries of the world, you are also exposing yourself to a variety of jurisdictions and laws. The companies involved here are big enough to set precedents in more than one direction.

To conclude, I can only say that it is my personal opinion that this case is being dealt with not only with due diligence, but probably with more efforts than any of the cases in which you “help within minutes”, as explained on your web site. For this reason, it is very difficult for me to understand why, after so many days, you are still blocking the email of thousands of Interbusiness users like ourselves.

Thank you again for what you can do.

Aftermath

The day after the letter was sent, the maintainers of the list replied, and “unblocked” the IP address. Little more than a month afterwards (in March 1998), Washington became the first US state to pass a law that makes it unlawful to send email advertising with forged or hidden header and other sender data, or containing misleading information in the subject line. Apparently, though, it did not help much…

2000 Update

More than two years after writing the above, I again had a chance to be surprised by the “creativity” of an organization presumably specializing in unsolicited commercial email. Actually, this episode made me dig a bit deeper into how “spammers” work.

While I was working at Cloanto’s Italian office in May 2000 somebody in the US (as confirmed by their ISP) began mailing unsolicited commercial email using a forged “@cloanto.it” address, using SMTP servers of what appeared to be unrelated third-parties, located in different countries, including the US and Italy. The items being promoted in this episode were not the usual pills or get-rich-quick schemes, but the essence of “spam” itself: millions of email addresses, complete with bulk mailing software.

Within seconds, Cloanto’s mail servers started receiving the first bounces and complaints from some of the intended recipients. Within minutes, the activity was traced back to the presumed sources.

A first lawsuit was filed:

Because the matter is still in the hands of police and judicial authorities in several countries, I am unable to provide full details. To make things more interesting, though, I can mention that the name of one of the individuals identified as part of my personal research into this specific episode is also mentioned in “Behind Enemy Lines – A Spammer’s Luck Runs Out when She Forges the Wrong Domain“, by “Man in the Wilderness”, which is an interesting recount of a very similar episode, occurring at about the same time, of which however I was not aware at the time. If you want to learn more about how these organizations appear to operate, I recommend that you read Behind Enemy Lines.

The first lawsuit was based both on Italian civil laws and on more than 10 different articles of the Italian criminal code (Codice Penale), and was filed both on behalf of Cloanto IT srl and on behalf of myself as an individual. Since the episode involved, among other things, the alleged unauthorized access to third-party servers, unsolicited mailings to private citizens, use of server and connectivity resources, and the offering of commercial products, the laws invoked ranged from various computer crimes, to privacy, theft, unauthorized use of registered trademarks, unfair competition, and several others.

In theory, in consideration of the criminal charges which have been filed, it is my understanding that the individuals involved in this episode risk deportation to Italy and imprisonment. Maybe they will be extradited, or maybe they will be held at some airport upon entering the European Union, if they ever do. Or maybe nothing at all will happen. Italian justice is known to be a bit slow, and there are certainly more important matters than an episode of “spam” which requires relatively complex international procedures in a country which is not known to excel at speaking English. On one hand this is not a case of homicide, but on the other hand the cost of unsolicited commercial advertising is being estimated at billions of dollars (2002 data by Ferris Research and Gartner) per year. And if “spammers” started thinking twice before using email addresses, domain names and SMTP servers which could be covered by laws in any part of the world, maybe this will help.

2002 Update

Just a quick update about some trends I noticed during 2002:

  • The word “spam” has become so popular that I am going to begin using it without quotes, in spite of it originally meaning something else.
  • The use of addresses which never appeared on web sites and newsgroup posts suggests that a large amount of email data has been extracted from personal email address books or network communications via tools like email-collecting ActiveX controls, “virus” programs, and/or TCP/IP packet sniffers. In plain text: you can work as hard as you want to keep your addresses private, but if a friend of yours installs some malicious software by mistake, your address will ultimately be collected, one way or another.
  • Fake sender addresses in general are increasingly being used, with little or no respect for the abuse of real domains and addresses, and with intense rotation of multiple From addresses for the same mailing. I see fewer From addresses which, even if they traced back to over-quota mailboxes, at least appeared to be real. Changing the From address for each message or small group of messages is both a primitive attempt to avoid address-based filtering, and it more evenly distributes costs. For such an illegal practice to stay alive, finding out who “spammed” you or used your data in a fake From address has to remain considerably more expensive than the damage you sustained, which is an important part of the synergy of factors which is making the whole phenomenon possible in the first place (other important aspects include lack of prosecution, relative simplicity and anonymity compared for example to X.400 mail, and the cost of sending unlimited emails tending to zero).
  • Fake sender addresses from the same group of people as the recipient(s) appearing in the To field (by domain and as originally collected, e.g. via software running on user’s computer) are increasingly being used, the logic being that if a mail appears to be from a friend or co-worker it may more easily appear to be legitimate and pass through a filter.
  • Fake sender addresses are increasingly being used in email messages appearing to be sent by “anti-spam” activists and organizations. The messages are subtle enough to appear to be from the senders they claim to be, but annoying enough to be perceived as annoying in the way they address the issue.
  • Fake sender addresses are also increasingly misused by virus programs, suggesting that a single approach against easily forged sender addresses could help better control both problems.
  • Random destination addresses are increasingly being used (e.g. not only “sales” and “info” at any domain are being targeted, but also “jody”, “cliff”, etc.), further indicating how low the cost is to send out such emails and not having to care about consequences or proper data maintenance.
  • I am receiving spam in English (more than 90%), Spanish, Portuguese, Chinese, French, German, Italian, etc. Does this mean I know all of these languages? No. It is just another example of how inexpensive it is to randomly spam people.
  • Subject line and message body obfuscation (e.g. “V1AGRA” with the digit “1” instead of the letter “I”) and randomization techniques (one or more random words, or entire random paragraphs from long texts, e.g. books or web sites) are increasingly being used to make each message different from other messages. This is just an example of the dynamics of adaptation of one front as the other front introduces new tools to defend itself. As long as it is legal, I expect this adaptation process to continue. It only takes one smart programmer to empower a million technically unskilled “spam kiddies”.
  • Spam is increasingly used to advertise itself as a product or service.
  • As network connectivity is increasingly being offered in hotels and through public wireless access points, unsolicited commercial email is being increasingly sent via notebooks from hotel rooms and through wireless networks (intentionally public, or unsecured enough to be accessible by anybody driving by with a notebook and an antenna).
  • “NetBIOS Spam” or “Messenger Spam”: the Windows Messenger service is being used to convey pop-up window advertising messages through ports like 135, 137 and 139, which are used for other purposes as well, forcing network administrators to shut down yet another useful service. Under the protection of firewalls I was not personally aware of the magnitude of the phenomenon, until I saw the messages pop up every few minutes on a system which had just been placed online. Seeing is believing, they say.
  • Spam filtering products and services are getting increasingly popular, but not one of them can guarantee that non-spam messages (written by humans or machine-generated) are not filtered by mistake. Once again, the innocent senders and recipients of legitimate and possibly important emails pay the price of spam, and the careful and time consuming manual sorting of incoming mail remains the only reliable way to process unsolicited mail.
  • Some spam filtering products appear to generate more annoying mail than they eliminate. I keep receiving Turing test requests (to prove that I am human, i.e. not a spam sender) to confirm that a message “I sent” is not spam. Whoever developed such software apparently never considered that the majority of spam now uses forged sender addresses, so the message text is incorrect (it more or less explicitly accuses people of having sent something they never sent) and the message itself is annoying (as it “spams” people who have nothing to do with the spam mail). Once again, innocent people pay the price of spam…
  • Spam is making people angry, affecting their health and how they react to other people. Imagine going to work on a Monday morning, switching your computer on, finding 419 junk mail messages, and then taking an important call…
  • The fact that users have to increasingly hide and/or frequently change their email addresses, and/or resort to spam filtering products and services to defend themselves from spam, rather than being protected by laws, is increasingly supporting and legitimating spam itself. I am noticing several emerging parallels between this situation and that of “virus” and “antivirus” software.
  • Spam is increasing in volume and in variety, especially when it comes to more aggressively cultivating illegality and ignorance (e.g. alleged depictions of rape, modern-day snake oil formulas, get-rich-quick schemes, exam cheats, pirate software, cable and satellite TV cracks, etc.), which seems to tell a story about the perpetrators, their consumers, and the society which nourishes this.
  • The maintainers of The Spamhaus Project, who also operate the Register of Known Spam Operations (ROKSO), estimate that about 90% of all spam received by Internet users in North America and Europe is sent by a hard-core group of about 100-200 individuals (data released during the year ranged from “100+” to “180+”). They note on their site that “These known, professional, chronic spammers, many with criminal records for theft and fraud, are loosely grouped into gangs (‘spam gangs’) and move from network to network seeking out Internet Service Providers (‘ISPs’) with poor spam control and taking advantage of the slowness of some service providers to terminate them.”
  • Innocent people and organizations are increasingly paying the cost of unsolicited commercial email. I would never have imagined, back in 1998, that in 2002 I would not be able to send out an email containing the word “free” in the subject line, as it would be automatically classified as junk mail by some filters, unless I paid for a service which embeds copyrighted poetry in the email headers.
  • People are tired of spam. Really tired. “As seen on NBC, CBS, CNN, and even Oprah.”

LCD Display Discomfort

Last minor revision November 28, 2008. Original text published July 19, 1996. © 1996-2008 Mike Battilana.

Although this article was written with great care, it may reflect personal opinions of the author, which are not necessarily shared by the publishers, who cannot assume any responsibility for mistakes or misprints. Nothing in this article should be regarded as medical advice. If you require medical or other expert assistance, you should consult a professional advisor.

Introduction

As soon as I heard about LCD displays, and even more after seeing some flat panel monitors at computer shows, I became a fan of this technology. As a computer user sitting in front of a monitor for more hours than the sun shines in the sky, I was very happy that I would soon be able to enjoy a flicker-free and radiation-free alternative to bulky and power-hungry cathode ray tube monitors. That was until I tried them, beginning a story of hard to describe discomfort and an apparently vague relationship with existing research involving factors such as fluorescent light, flickering, lighting, glare, contrasts and patterns. I now refer to this as “LCD Syndrome”.

Quick Help

If you reached this page because you are experiencing symptoms with a new monitor you may want to try and reduce the brightness and/or increase the general ambient lighting. Excessive display brightness and large contrast between the display and the environment are common and easily solvable causes of discomfort.

First Experience

I found out in 1996 that I could not make full productive use of an LCD display (800×600 resolution) for longer than 20-30 minutes, after which I began to feel uncomfortable. This was not easy to describe, because the symptoms were not particularly strong, and they were going against the expectation and conventional wisdom that LCD displays were better than CRTs. I would say it was a combination of slight headache and eye irritation. I normally had no problems with fluorescent lighting, nor with any working environment in general. I do however feel a similar discomfort when standing for some time near certain mosquito killing devices which use a violet fluorescent tube to attract the insects.

A friend of mine develops symptoms similar to ones I experienced after only about 5 minutes, and he was the one who led me to the first apparent cause of this: the fluorescent light use for the backlighting of the display. He is a teacher, and the first thing he does when he enters a classroom is to switch off the fluorescent lights. He says he is “allergic” to fluorescent light. When he purchased a notebook with a black and white LCD display he found out that he had the same problem he has with fluorescent room lighting. I personally tried out a variety of LCD displays, from the cheap ones up to the high end, and I also spoke with a lot of people about this, concluding that the issue is surprisingly more widespread than I expected it to be. The feedback I am now receiving from this text also appears to confirm this.

Frankly, I was amazed and puzzled as to why none of the reviews of notebooks or LCD displays that I was aware of at the time had ever mentioned this. The first thing that came to my mind is that this is a relatively new technology, and the new parameters that need to be evaluated are not yet part of our testing and buying culture. A second aspect of this issue is that the problem in my experience only develops under real working conditions, rather than when admiring the sharp pixels of an LCD display for a few seconds at a trade show, or testing the LCD viewing angle for a review or in a computer store.

According to my personal observations with panels employing this technology, many people cannot work continuously with such a display for as long as they can with a traditional CRT. Some people feel a headache after about half an hour of work in front of an LCD panel, yet they can stay 16 hours in front of a CRT. My situation improved when I conducted tests on word processing tasks with white text on a black background, rather than black text on white background (which, traditionally, on a CRT with a sufficiently high refresh rate is considered to be more ergonomical).

I am not a doctor nor an expert in this field, and I did not have a chance to conduct tests on a sample wide enough to be considered significant. The feedback I am receiving may be biased, as it is increasingly based on this article (i.e. people may be finding a description of the problem they have been looking for, rather than an unexpected and new perspective). Yet I am left with this personal feeling that there is “something wrong” with LCD displays that has not been researched or mentioned enough. I hope that these notes of mine can inspire somebody to perform more thorough investigations.

Fluorescent Light

While everybody appeared to be focused on the “zero radiation” advantage of LCD technology, I could not avoid thinking that, behind the liquid crystals (which the “L” and “C” letters stand for), there was a source of light. This happened to be the same fluorescent light technology which, I knew, was not recommended for use as the only light source in offices. So, if there was something less than ideal about using it too much in an office, how came nobody mentioned this in relation with the fact that LCD display users stare at such lights all day?

Checking things like the refresh rates and the frequency peaks of a source of fluorescent light is not normally done with CRT displays, and does not naturally cross one’s mind when thinking of liquid crystal displays. The fluorescent light is a separate component from the LCD display, and is never mentioned as part of the final “LCD display” product.

Two aspects of fluorescent lighting are in my opinion worth mentioning.

  • Fluorescent lights, like other types of lighting technology, including the sun, have their own frequency spectrum, with peaks at certain bandwidths.
  • Unlike the sun (and other lighting technologies), fluorescent lights are not stable, but rather, they are pulsing, i.e. they go on and off several times per second.

As technology evolved over time, the average refresh frequency increased from less than 100 Hertz (times per second) to several hundreds or thousands of Hertz. With modern electronic ballast systems, frequencies above 20 kHz are generally chosen in order to avoid interference on frequencies that could be audible to the human ear. I know that many experts are claiming that we cannot perceive certain higher refresh rates, but when I consider that the sunlight does not go on and off all the time, but rather it is “always on”, I can’t avoid thinking that everything else is not as “natural”, and the possible side effects may not be obvious.

The negative effects of 100% fluorescent room lighting have been known and studied for some time, and thinking about it from this perspective I would find it logical that directly staring at a source of fluorescent light can be just as bad, if not worse. I know that many people have problems with the energy-efficient fluorescent room illumination in general, and prefer the traditional light bulb, which I believe has a wider frequency spectrum.

The incandescent filament of a light bulb probably also generates a more stable light than the fluorescent substrate under the intermittent pulses of electrons. Assuming that fluorescent light is a bit like the scan lines of a television, i.e. it turns on and off all the time, but our eyes and nervous system make us perceive it as a persistent light, I cannot understand how some publications that praise LCD displays ignore this similarity with what is possibly the most negative aspect of CRT displays. Even if you don’t normally perceive the flickering of your display or TV, if you point your eyes upwards, you may be able to discern some flickering in the lower part of the visual region (the peripheral area is more sensitive to flickering). I am sure that there is a range of frequencies, which may or may not include the refresh rates normally used for fluorescent light, which cannot be perceived by the average person, but which can cause discomfort in the longer term.

Even if it is claimed that flicker is not perceptible at rates of hundreds or thousands of times per second, it must also be considered that this applies to a single source of light, and not necessarily to multiple sources of lights, which may not be in sync with each other. In physics, when two or more waves are added up, they result in a new wave pattern. Interference phenomena might apply to LCD displays (e.g. displays that have two fluorescent tubes), room illumination (with multiple light sources), and combinations of LCD displays and room illumination.

Could it be that the fluorescent light is a factor or co-factor in this “LCD Syndrome”? And if so, is it because the spectral distribution of the light is not what evolution trained us to live with, or because a pulsating source of light is used for the background illumination, or for a combination of both reasons? I know people sensitive to fluorescent light, but also people sensitive to flickering in general (even to the way frames are displayed at movie theaters, which is considered relatively stable).

At the time of this writing, it is expected that LEDs and other technologies will replace fluorescent backlighting in LCD displays.

Other Causes of “Flickering”

Old fluorescent lights are not the only cause of flickering. Even when the most stable backlighting and content rendering technologies are available, fluctuations are sometimes added intentionally:

  • “Brightness modulation” may be employed as a power saving mechanism or to increase the overall dynamic range. For example, with some LED devices, the LEDs are rapidly turned on and off, or otherwise changed between varying levels of intensity.
  • “Temporal dithering” (also referred to as Frame Rate Control, or FRC) works at the pixel level. Just as you can create a pattern of yellow and red pixels (spatial dithering) to create the illusion of orange, it is also possible to quickly alternate red and yellow in the same pixel. This technique is sometimes employed to increase the perceived color space, for example to render 24-bit color on LCD panels that without dithering would be more similar to a 20-bit display.
  • On some LCD monitors the backlight has maximum stability at maximum brightness, but different degrees of modulation are applied when brightness is reduced. In practice, the image begins to “flicker” as brightness is reduced.

The above may result in additional interference patterns when combined with each other, and with room illumination or backlight.

Pixel Sharpness and Pixel Patterns

LCD displays are better known for their brightness and for their “sharp pixels” than for the fluorescent light they employ. What if one of those very same factors which are normally considered positive in benchmarks, such as its crisp pixels, were part of the problem?

On LCD displays the individual pixels are much sharper than on CRT displays, thereby making it possible for the eyes and the brain to:

  • focus on each pixel
  • recognize pixel patterns

I believe that both factors could introduce new potential causes of discomfort, compared to CRT technology, considering that neither with CRTs nor in real life do we normally have as many tiny details to focus on with such clarity, and for those parts of the brain which discern movement and high contrast images to work on.

Microsoft Windows includes a font smoothing technology called ClearType, which uses the colored sub-pixel components of LCD display pixels to increase the perceived resolution, while at the same time reducing the contrast and sharpness of the (larger) individual pixels. It can be enabled in the Appearance tab of the Display Properties, under Effects…

I later found out about research done in the field of “pattern glare”, i.e. a hypersensitivity to repetitive patterns, including stripes and lines of print. It seems to correlate with my observations on pixel patterns, also considering that in my 1996 800×600 LCD monitor the individual pixels could be discerned much better than on newer monitors (which, at much higher resolutions, have significantly smaller pixels).

Brightness

What else is so nice about LCD displays, besides their crisp pixels? Brightness, of course.

In my experience LCD displays have higher default brightness settings, and can reach an even higher level by adjusting the settings. A possible explanation for the high default settings may lie in a combination of technology and marketing. While CRT technology is limited in the maximum brightness that can be achieved (visible light is generated by the phosphor coating behind the glass when it is hit by an invisible beam of electrons), in LCD displays the rendering technology is separate from the light source, making it possible to use brighter sources of visible light. Since maximum brightness is often a parameter in monitor benchmarks, and it may help get noticed in a store, manufacturers are tempted to prefer (since it comes at a relatively low cost) high brightness technology in LCD displays.

One piece of advice I can give in case of perceived problems in relation to LCD displays is, indeed, to try and reduce (even drastically) the brightness settings, as these displays can be, in my personal experience, much brighter than CRTs. I have seen this solving more than one problem case. I set the brightness to 0 (!) on my Dell 2407WFP monitor, and it still doesn’t look too dark even to bystanders. The ambient lighting should also not be too low, relative to the display. If you suffer from discomfort, try and adjust the environment so that the ambient-to-display brightness contrast is between 1:3 to 1:1 (room background and display are about the same perceived brightness).

Polarization

Whereas sunlight and light emitted by incandescent bulbs and CRT displays oscillates in multiple directions perpendicular to the light beam, the fields from polarized light oscillate in only one direction. Polarized light is not only produced from certain light sources, but it can be the result of non-polarized light being reflected from certain surfaces (e.g. water or glass), or being filtered by polarized filters, which include certain sunglasses, liquid crystals, and the polarizer plastic sheets used in LCD displays. Several animals are able to detect light polarization. Some, like honeybees, use this sensitivity as an aid in their navigation.

On LCD displays a combination of polarized filters and the ability to electrically control the polarization angle of liquid crystal molecules is used to produce images. Two side effects of this technology are that:

  • the resulting light, as seen by the user, is polarized
  • as the user’s viewing angle changes, the visible image content changes too (more or less noticeably, depending on the display type)

Binocular Cues

Our sense of depth in viewing the real world is the result of several factors, which include the fact that each eye sees a slightly different view of the world. This is also used in stereoscopic (3D) visualization technologies, some of which have viewer discomfort as a well-known side effect. It is also known that when the angle or separation of two cameras used for a 3D film is not “natural”, this may lead to headache. This may be caused by the inability to cope with excessive 3D cues, or by “wrong” cues altogether. If this is a factor affecting 3D display technologies, and considering that most LCD displays have a viewing angle limitation which results in slightly different images (colors) being received by each eye, there could in my opinion also be a relationship with LCD display discomfort.

“Visual Stress”

Factors like the increased visibility of individual pixels and patterns of pixels (which phenomenon can be reduced, e.g. by using technologies such as ClearType or by using higher resolution displays), and the high brightness potential of LCD displays (which brightness can also easily be reduced by adjusting the monitor settings), lead to interesting parallels with the triggers and mechanisms which are related to headaches and other symptoms in a condition known as Scotopic Sensitivity Syndrome, Asfedia (Arrhythmic Saccade and Foveation During Edge Detection) or Meares-Irlen Syndrome.

When I first heard about companies like TintaVision and Irlen Institute I was prejudiced by the perception of organizations that seemed to be interested in selling colored glasses, filters and other “patented” methods without exposing a sound scientific method. I would have preferred some scientific research (published papers with double-blind studies and reproducible results, independently verified by others) rather than web sites providing only convenient examples and case studies going all in one direction. In the meantime, such research is beginning to appear. For example, see Bruce J. W. Evans and Florence Joseph in “The effect of coloured filters on the rate of reading in an adult student population” (Ophthalmic and Physiological Optics 2002 22: 535-545). At the same time, I could not find the topic being related by others to manifestations of LCD display discomfort. However, I also have observed how extreme brightness, which is on the opposite end of using colored lenses or reducing the monitor brightness, can be a factor in LCD discomfort.

Chemicals

Volatile organic compounds (VOCs) and other chemicals have been studied for years with respect to their relationship with “sick building syndrome” and discomfort related to “new car smell”. The chemicals that can be released when new computer equipment is unboxed and used for the first few weeks are often similar to those that seep from walls, furniture, carpets, glues, paints and plastics in buildings and cars, and which have been linked to headaches, watery eyes, sore throats, nausea and drowsiness. The symptoms have been traced not only to the individual components, but also, in a larger measure, to their combination, and even more so when ozone (from laser printers, copiers or car traffic) is added.

If you bought a new display, and perhaps even a new computer, and you suspect this to be a factor, you may want to put the packaging away and pay extra attention to air quality in the first few weeks (up to six months, according to some studies relating to buildings and cars).

Update

As of the end of 2001, i.e. after more than five years of additional experience since I first wrote this text, I keep getting a diversity of feedback from readers of this page, however I am now myself using the 1600×1200 pixel LCD display of my notebook computer (IBM ThinkPad A21p, pretty high resolution, I love it, but I know that other people are not at ease with the small pixels). I actually prefer the notebook display to the same resolution displayed by a top-of-the-line brand name 22″ CRT set at a high refresh frequency. Although the LCD display is smaller, the pixels are much more detailed and crisper, resulting in the smaller display being much more readable than the larger one. This was, of course, also the case with the first LCD displays I tried more than five years ago, but something must have changed in the technology, or maybe it is the higher resolution, as I can now work all day on an LCD display without discomfort. It could be that at a resolution of 1600×1200 my eyes are not trying to focus on the individual pixels as they had possibly been doing at 800×600. Also, one of the first things I do on most new LCD displays is to reduce the brightness.

Changing the surrounding lighting (e.g. adding a small light to reduce the perceived brightness of the display) has also been reported to help. Quite often LCD displays are sharper, but smaller, than CRT displays having the same resolution, so that old habits may result in the LCD panel being positioned too far away, which also can cause a type of discomfort not experienced with larger displays. I’ve heard of cases where even an additional keyboard placed in front of a notebook computer caused the notebook (and its LCD display) to be placed too far away to be comfortable, yet the cause of the discomfort was not immediately obvious. On the other hand, the closer the distance the more likely it is that the eyes recognize the individual pixels, which may or may not be related to the perceived problem. Working with notebooks, where one can’t easily adjust the distance and height of the display, we have moved one step back in ergonomics, compared to when independent keyboards where introduced (and often made compulsory with a lot of energy, compared to the silence that is accompanying the widespread use of notebooks).

Conclusion

It appears that LCD displays have introduced a new combination of elements which are not present in nature, possibly including issues such as spectral distribution, flickering, more recognizable pixels and patterns, increased brightness, polarization and “wrong” binocular cues. This is not what evolution has trained us to live with under the sun, and some people react to it differently than others. In particular, given a combination of these factors, some people appear to develop an “LCD Syndrome”. I hope that this little research helps, at least to know that if you read this page up to this point, you are not alone.

Feedback

Here is some feedback kindly provided by other readers:

I was wondering if you’ve heard from anyone who is experiencing these headache/nausea symptoms after purchasing an LCD TV? I have recently purchased an expensive 50 inch LCD […] TV and am finding it difficult to watch – about five to ten minutes in I start feeling headachy and nauseous. I love the screen resolution and find it to be a much better picture than other large screen TVs (other than plasma), but I’m afraid I have wasted my money. Has anyone given you advice in this area on how to relieve the LCD-induced strain?

Hi, you cannot imagine how grateful I am to you for writing your online article about LCDs and Fluorescents. It was such a tremendous relief to read all those testimonials and thereby know that I’m not alone in my experiences. May I suggest a way to make your site Google-findable to a greater cross-section of people, by adding more keywords, such as: “EMFs, electromagnetic fields from monitors, electrosensitive to monitors, electrosensitivity issues” (remember, some people may be deluded by mass misguidedness, into thinking that theirs is an EMF problem versus a photosensitivity problem…

I can’t tell you how relieved I was to find your posting on your website about LCD problems. I experience those symptoms too of eye fatigue and headache within a few minutes of starting up my new iMac. It is actually more problematic than my […] CRT monitor. I also have problems with fluorescent light boxes for film and flat panel light boxes. Quite a handicap since I am a photographer who wants to do more Photoshop. I keep looking for solutions, or scientific confirmation of what you and your respondents describe.

I just wanted to say thank you. I found your article on LCD Displays and Fluorescent Light, I now realize it isn’t just me and have told my boss I would like to go back to my old fashioned monitor.

#1 lesson I have learned over the years…. The eye is not designed to stare at the brightest thing in the environment! (Nothing of interest is in the Cavemen’s sky.) The solution? Always have something in your field of view as you work that is BRIGHTER than your screen. A small table lamp behind or directly above the screen works great at helping your eye adjust to the correct F-stop and prevents eye strain. Obviously old style fluorescent room lights should not be combined with computer monitors, because the two sources of flashing cause, to use the EE [electrical engineering] term “aliasing”, other frequencies of flashing. I think the newer compact fluorescents flash faster and are less of a problem. So in my experience most eyestrain is correctable with a $10 table lamp.

I stumbled across your page while looking for LCD information, and my experience is the opposite! I’m 20 years old, I cannot sit at a CRT monitor, usually including a TV, for more than a couple hours. The symptom are a bad headache and more fatigue than with an LCD. I view the CRT monitors at about the same location as I do LCD monitors. I don’t think it is the refresh rate, but it could be. I usually have it set to a minimum of 75Hz, which is when I can barely see the flicker from the refresh (although, I can see it quite strongly out of the corner of my eye). I have the resolution the same as my LCD screen (1024×768) because that’s the max of my LCD, and CRT’s tend to get too blurry when you put them at any more than 1024×768 (comparatively). Right now, I only have a laptop, and I use it in all types of settings. Even when it is on my lap, I don’t get a headache.

Thank you for your article… I have been suffering from the above for a month and you have told me more than any doctor/optician/ophthalmologist. I must have spent about 20 hours searching the ‘net for info… A month ago I started noticing the same symptoms as you: funny feeling in the eyes, slight headaches. The first day it took just 1 hour to recover. the second it took 3h. Then the whole night. Then the whole weekend. Then by the end of the week it would just take one hour in the office before I had to go home!! However in my case, I notice that I am now very much allergic to fluorescent lighting (I was not before…) I hadn’t thought about the possibility that it could be my new LCD screen which has thrown me “over the edge” by exposing me to all this light… sounds very likely now.

What a relief to read your article “LCD Displays and Fluorescent Light.” I’ve been struggling with a brand new […] 17″ TFT LCD for four months now. I spent $1000Cdn trying to avoid eye strain and radiation and I’ve just ended up with some of the worst headaches of my life. I even went so far as to get the company to send a replacement and nothing has worked. It would make a lot of sense if the problem was some kind of inherent issue with the fluorescent lighting.

I’m a computer engineer; I spend all day in front of my CRT monitor without a problem, year after year. I recently bought and returned several new CRT monitor because they had various focusing/convergence problems. I then bought a [LCD monitor], which is wonderful for its clarity and color saturation, but seems to be burning-out my eyes and causing low-grade headache, even when I reduce brightness to zero (as dim as I can make the display). This is disappointing. I don’t think I can use this device. […] The LCD image seems to be good to your retina and bad to the rest of your nervous system.

I noticed irritation almost immediately, as you did, but tried to rationalize it as something else (sometimes if I each too much citrus, my contact lenses will irritate my eyes as tear mucus levels drop), but this irritation kept on building, and my eyes still bothered me a little when I woke up the next morning. Then I noticed that the eye irritation and headache would literally surge when I switched the screen to a mostly white image (like black text on white background). […] There have been enough studies done confirming the bad physiological effects of fluorescent lights in classrooms and workplaces for us to infer that putting the same device behind an LCD shutter matrix one foot from your face it at least as bad as a fluorescent light overhead.

My general vitality at day three is getting worse. The eye-ache will progress to headache, and then to a very slight, passing wave of nausea. I am defocused and disinterested in programming matter that I normally pursue with passion. […] The problem is definitely white light. Whenever I flip to a screen with mostly white areas, the pain/tension/general disease/nausea will surge slightly. I fear that there is only so much I can do with this device.

Fluorescent light seems to induce migraines. I’m in a shared office at work and when I can convince everyone to turn off the overhead lights I usually feel much better at the end of the day. I’ve been using computers for about 20 years now and it is getting harder and harder to use normal monitors.

This sensitivity actually led me to Wilhelm Reich’s research indicating that fluorescent light is very disruptive of ambient energy. James Demeo made a pretty impressive list of Masters thesis papers and doctoral dissertations dealing with Reich’s theories… I think at least that something more than meets  the eye is going on.

I have a fairly new IBM ThinkPad at home which I like very much, but like the computer at work I cant spend too much time without getting up and walking around the apartment for a while…

I found your discussion of LCD and fluorescent lights very helpful. I have never been able to use them successfully. My most recent attempt, on a truly lovely new Apple display, quickly led to vague eye discomfort and vague nausea. It wasn’t going to work!

Oh my god, you’re so right! […] Excited about all those “easy on the eyes” reviews about LCD displays, and frustrated from those flickery CRT monitors, I bought me a new and shiny LCD monitor. Ten minutes after I started using it, my eyes became “dizzy”, accompanied with a little headache, and I could surely see the white background flickering like hell. I thought to myself : “what the hell? wasn’t I supposed to get a relief for my soaring eyes by this new invention?” then I checked the web and I found your article about LCD Displays and Fluorescent Light. I almost cried when I realized what a waste of money this whole LCD shit is. I’ve put a lot of money buying this new screen and now I realize that it was for nothing, my eyes will continue to suffer, and there’s nothing I can do about it.

I’ve been suffering from debilitating chronic daily headaches for almost 2 years triggered by the [both CRT and LCD] computer monitor. My headaches get better after I stop looking at the screen for about half an hour. The days I don’t use the computer at all I feel great. My headaches are also triggered by fluorescent lights. I tried a TFT LCD monitor, my headaches got better but I still got daily headaches. LCDs aren’t supposed to flicker but I could sure see them flicker.

Fluorescent is fluorescent is fluorescent. Whether or not it is overhead, actually, or behind a small screen (if the LCD screen is fluorescent-lit). […] I’m somewhat rusty regarding the issue, but I will tell you what I remember. You are correct regarding natural, real sunlight. he spectrum is distributed fairly equally between the visible regions. UV is something like only 12% of the total. The rest is visible. The problem with fluorescent light is in order to make it bright (sorta like “sunlight”) they must use a great deal of green and blue. “Full spectrum” uses a huge amount of visible blue. What it does is peak at an incredibly high level. I.e.: sunlight has approx level (I’ve forgotten the technical term, but if you need to know I’ll dig it out of my file) at 225 of visible blue and fluorescent has over 1000 around the 435 nm level. The problem is, our eyes have developed to operate and see things visually at the range of 225 and fluorescent light over stimulates the eye to produce the same effect indoors. The retina of the eye is best stimulated by “blue.” I have in my possession only a fraction of animal/insect research that proves visible blue damages the retina at a level that is not easily seen on routine eye examination. In the studies, if the exposure was short, then there would be repair by the eye. If the exposure is long-term, there is permanent damage. When I was trying to have the ophthalmologists read the research I’d gathered, they only say the animal research does not apply to humans. I don’t believe them.

The LCD screen I would imagine must not be much different from [handheld game with LCD display], would it? I did have a problem with those.

I had a similar experience with a palm-size PC, which uses green LED (I believe) backlighting. It appears that in order to conserve battery power the light is pulsing on and off all the time, like the fluorescent light on notebooks. When the light is on I can hear a hissing sound on the speaker, which may be the frequency at which the LED is switched on and off. I find it disturbing to even play Solitaire when that light is on.

Any comments or experience you would like to share are very appreciated. If you would like to provide feedback concerning a newly purchased LCD display, kindly also consider adding a few details such as:

  • Did you place the LCD panel in the same position (e.g. on your desk) and lighting conditions (e.g. background) where you previously were using a traditional CRT monitor?
  • Did you try adjusting the brightness of the LCD display and its position (distance) relative to you?
  • What is the resolution of the LCD display, and what was the resolution of the previous CRT monitor you were using (if any)?
  • What was your subjective experience with the LCD display and how was it different from that with other types of monitors?

Thank you.

The GIF Controversy: A Software Developer’s Perspective

Last revision June 20, 2004. Original text published January 27, 1995. © 1995-2003 Mike Battilana. Parts are quoted with permission from CompuServe Information Service. Parts are excerpted from the PNG specification.

This article was written with great care. It may reflect personal opinions of the author, which are not necessarily shared by the publishers, who cannot assume any responsibility for mistakes or misprints. Nothing in this article should be regarded as legal counsel. If you require legal or other expert assistance, you should consult a professional advisor. Many of the designations used by manufacturers and sellers to distinguish their products are trademarks. The author of this article has made every attempt to supply trademark information about manufacturers and their products. GIF and Graphics Interchange Format are service marks of CompuServe Inc. PostScript is a registered trademark of Adobe Systems Inc. TIFF is a trademark of Aldus Corp.

Abstract

Between 1987 and 1994, GIF (Graphics Interchange Format) peacefully became the most popular file format for archiving and exchanging computer images. At the end of December 1994, CompuServe Inc. and Unisys Corporation announced to the public that developers would have to pay a license fee in order to continue to use technology patented by Unisys in certain categories of software supporting the GIF format. These first statements caused immediate reactions and some confusion. As a longer term consequence, it appears likely that GIF will be replaced and extended by new file formats, but not so before the expiration of the patents which caused so much debate (at midnight US Eastern Standard Time on June 19, 2003 for the US patent, and midnight local time on June 18 and 19, 2004 for the European and Japanese counterparts, respectively).

Introduction

This is a very interesting case, which could teach more than one lesson on the theory and practice of software and the laws. There are many entities involved. Fingers have been pointed at lawmakers, Unisys, CompuServe and developers. In theory, it may have been possible for any or all of these parts to prevent the matter from creating so much anxiety in the first place. Yet we are all here, debating on this issue. This article intends to provide a collection of information from the history of the controversy to the most recent events, as they were perceived by a software developer.

CompuServe released GIF as a free and open specification in 1987. GIF soon became a world standard, and also played an important role in the internet community. It was well supported by CompuServe’s Information Service, but many developers wrote (or acquired under license) software supporting GIF without even needing to know that a company named CompuServe existed. GIF was relatively simple, and very well documented in books, articles and text files.

GIF images are compressed to reduce the file size. The technique used to compress the image data is called LZW (after Lempel-Ziv-Welch) and was first described by Terry A. Welch in the June 1984 issue of IEEE’s Computer magazine. Unisys holds a patent on the procedure described in the article, but the article describing the algorithm had no mention of this. The LZW procedure was simple and very well described, and it soon became a very popular technique for data compression (just as GIF would become a standard in its own field). It appears that neither CompuServe, nor the CompuServe Associate who designed GIF, nor the computer world in general were aware of the patent. GIF is not alone in the use of LZW. The TIFF file specification also includes LZW-compression among its compression methods, and so do dozens of very popular file archiving programs (such as Compress).

While having the right to pursue legal action or seek damages against infringing LZW developers and publishers, Unisys has so far been very accommodating and fair. It is likely that the success of LZW and its thousands of implementations, especially among small developers, caught Unisys unprepared. Otherwise, it would be difficult to understand how Unisys could first allow a very large number of small and big developers to use LZW for years, and then, after the establishment of various standards based on LZW, change its attitude.

The original CompuServe/Unisys licensing agreement text which had upset so many developers was immediately followed by clarifications from both CompuServe and Unisys. Given that the online community tends to be suspicious about anything that is big, has a legal department or owns software patents, Unisys had to face a particularly delicate challenge. But it probably wasn’t easier for CompuServe, who had to explain the patent issue to its own developers, some of whom felt “betrayed”. The outside world would learn about this issue from the press in the following days.

Even Time Magazine reported about this matter, although like most of the newspapers it concentrated on GIF more than on TIFF, LZW, Unisys or software patents. In the meantime, a group of leaders of the online graphics community began working on a patent-free future of GIF. These efforts would later converge into the PNG specification. The full texts of official statements from CompuServe and Unisys are also included at the end of this article [edited out of this version].

Among the first reactions, some bulletin board systems had all GIF files deleted from their hard disks (or converted into JPEG format). Common remarks included:

“PROTEST OF NEW COMPUSERVE-UNISYS GIF USAGE TAX !!”

“They [CompuServe] seem to think that GIF is the greatest thing since free online magazines.”

“The announcement by CompuServe and Unisys that users of the GIF image format must register by January 10 and pay a royalty or face lawsuits for their past usage, is the online communications community’s equivalent of the sneak attack at Pearl Harbor.”

These reactions may require some clarification.

Unisys, and not CompuServe, has been “trying to impose” a royalty. The problem is not specific to GIF, but includes TIFF and archiving software.

GIF files are not covered by the patent. There is no risk in distributing GIF files or in using the GIF name. According to a CompuServe spokesperson, “Recent discussions of GIF taxes and fees are totally without merit. For people who view GIF images, who keep GIF images on servers, or who are creating GIF images for distribution, the recent licensing discussions have no effect on their activities.”

Only the software employing the LZW algorithm for writing GIF files is “at risk”. The Unisys patent includes claims which specifically cover the decompression of LZW-compressed material, so it may also affect simple GIF readers. Several patent attorneys consulted on this matter have concluded that decompression-only programs do not infringe upon the Unisys patent. Unisys however does not appear to share this opinion.

A format such as JPEG cannot be used as a substitute for GIF. Unlike GIF (and PNG), JPEG was designed as a “lossy” format. This means that it slightly changes an image as it is compressed. This is unacceptable for many applications. Also, while JPEG excels in compressing real world true color images, it offers no support for palette-based images (support for these has been added as part of the JPEG 2000 standard). Additionally, some JPEG-related  technology itself has been the object of patent claims (e.g. US patent 4,698,672).

The CompuServe licensing agreement was intended as a voluntary service to the few dozen developers creating software for use primarily in conjunction with the CompuServe Information Service (CIS). This includes applications such as CompuServe “navigators”, but does not apply to general purpose GIF readers/writers (which are not intended for use primarily in conjunction with CIS).

On January 27, 1995, Unisys announced new licensing policies regarding “The Welch Patent”. These include a .45% royalty on the total unit selling price of GIF/LZW products (minimum $0.10, maximum $10.00 per unit) and a .65% royalty on GIF/TIFF/LZW products (minimum $0.20, maximum $25.00). For further information and a copy of the written agreement it is possible to call Unisys at +1 215 986-4411, or email lzw_info(at)unisys.com.

Any organization using LZW should look at whether they have an infringement on Unisys’ patent. CompuServe is not involved in any of these discussions – they are between Unisys and outside developers.

Software Patents

Normally, procedures such as LZW are published in magazines so that they can be shared by the community of software developers. LZW itself is a refinement of other algorithms published in the years before (Ziv-Lempel and others). Software is usually protected by copyright law, but in recent years (since 1981 in the USA) in several countries it has become possible to patent software. Initially, only software used to control hardware could be patented. This interpretation was soon extended to include all types of software (except for “pure mathematical algorithms”). While software patents have become an opportunity for many, they remain a controversial danger for others. Any programmer or publisher might be trapped at any time by a patent infringement claim that could not be foreseen or avoided.

Publication of an algorithm in a magazine does not automatically exclude a patent application. In many countries, including the USA, it is possible to apply for a patent and still publish the paper without mention of the application. In the USA (but not in many other countries), the patent application may even be filed within 12 months of the publication. Under such regulations, the only algorithms that might be used freely and without risk would be those published prior to 1981 (e.g. Donald Knuth’s “The Art of Computer Programming”).

Today, even designing a graphics file format can become a programmer’s nightmare. One very active member of the internet community (and author of the GZIP compressor) has collected information on more than 350 patents on lossless data compression and 100 on lossy image compression. Lempel, Ziv, Cohn and Eastman patented their original LZ78 algorithm (US patent 4,464,650). The LZW algorithm which is now attracting so much attention is patented by both IBM (4,814,746) and Unisys (US patent 4,558,302, European patent 0,129,439 – Japanese and other patents pending), while British Telecom (BT) holds a similar patent. The IBM patent application was filed three weeks before that of Unisys, but the US patent office apparently failed to recognize that they covered the same algorithm. (The IBM patent is more general, but its claim 7 is said to be exactly LZW.)

The LZW patent, as well as its international counterparts, and similar patents filed by others, are expected to remain valid for at least 20 years from the original filing date of June 20, 1983, i.e. until midnight US Eastern Standard Time on June 19, 2003 for US patent 4,558,302. The European counterpart EP0129439 (covering at least Germany, the UK, France and Italy, as long as yearly fees are paid) was filed on June 18, 1984, and the Japanese counterpart JP7249996 was filed on June 20, 1984. Although the international counterparts mention June 20, 1983 as the priority date (which is relevant for determining novelty) the validity of these patents is 20 years from their filing date.

10 Years of LZW

While the original article on LZW was published in 1984, the LZW patent issue first surfaced in the press in 1989, when the BTLZ algorithm (a procedure similar to LZW developed and patented by British Telecom) was to be approved for data compression into the V.42bis modem standard. Unisys said on at least one occasion that it first began to learn of the widespread use of LZW in connection with the development of this standard. The first licensing arrangements put into place included those with modem manufacturers ($ 20,000 for each one-time license) and with Adobe PostScript developers ($ 10,000).

An article on “LZW Data Compression” was published in the October 1989 issue of Dr. Dobb’s Journal (see the Bibliography section for more details). A reader replied in the December issue explaining that the algorithm was patented. The author of the article added that he was unaware of any patent on the algorithm. More readers wrote, and in the March 1990 issue the editor-in-chief dedicated his Editorial to this topic, which in his words “sparked a forest of fires”. The same issue also contained an official statement by Unisys Corporation, which confirmed that LZW was patented, mentioned the modem industry, and indicated how developers could contact Unisys.

In the October 2, 1989 issue of PC Week, “Spencer the Katt” wrote:

“Alas, there’s no consolation for developers of archiving programs that rely on the LZW data-compression algorithm. While cruising the bulletin boards last week, Spencer learned that Unisys has a patent on the algorithm, upon which a slew of data-compression programs are based. Watch out.”

In about the same period, an article in InfoWorld mentioned the fact that modem manufacturers were facing the possibility of having to pay royalties to Unisys and to other patent holders for the right to use LZW.

Page 132 (“LZWEncode Filter”) of the PostScript Language Reference Manual, Second Edition, published in December 1990, contains the address of the Welch Licensing Department at Unisys Corporation.

In the March 1991 issue of Byte, Steve Apiki (“Lossless Data Compression”) explained that LZW is used in GIF, and that “The [LZW] algorithm itself is patented by Sperry [now Unisys].”

At this point, at least the readers of some publications were potentially aware of the LZW patent. But still, there were few links to GIF. Unisys apparently didn’t know about GIF, nor did most GIF developers know that GIF contained LZW technology. And those who may have known, not necessarily knew about the patent.

This issue was also discussed among a small group of the better informed members of the CompuServe PICS Forum (now GRAPHSUP). The general feeling at that time was that “Unisys only intends to get royalties from hardware vendors,” and there was some consensus on the idea that Unisys “wouldn’t do anything about pure software implementations”.

Until the end of 1994, discussions on CompuServe’s Information Service showed no clear mention of the requirement to get a license from Unisys for using LZW in GIF applications. During 1988 at least one developer stopped working on GIF tools because of considerations regarding the LZW patent, and reportedly “made CompuServe aware of it”. This apparently was limited to private verbal conversations, and information on this behalf could be found neither in the press nor in CIS. Other developers are reported to have directly informed the Unisys licensing department about the use of LZW in GIF between 1988 and the end of 1989, but it is not known that Unisys ever acknowledged these actions in writing. At about the same time, at least one CIS member, working for Unisys, reportedly informed Unisys management, “clearly pointing the use of LZW in GIF”.

In 1996, attention was drawn to one of CompuServe’s public file libraries, which still contained an archive named “UNIGF2.ARC”, uploaded on July 13, 1988, titled “Decoder — Sperry, Unisys”, and described as a “Sperry PC high-res GIF display” program. The uploader however immediately clarified that all references to Unisys contained in the article were meant to indicate a particular configuration of computer and graphics board, and not the patent.

Among the developers who contacted Unisys between the end of 1990 and the beginning of 1991, there was at least one GIF developer. He recently described his experience:

“Finding the right person was the most difficult part of licensing LZW, but hopefully it’s easier today (perhaps only 5 phone calls would be needed!)… When talking to Unisys back then, my recollection is that we had to basically tell the people at Unisys, ‘Believe me, you DO own a patent on LZW; who do we talk to about LICENSING?’ When we finally reached the licensing/legal department, THEY knew they had a patent, and spelled out the terms. I recall the person we were dealing with saying something like, ‘They [Unisys] laugh when I make all these $1 deals, but we have to charge something to protect the patent.'”

In those days, the standard license fee for PC-based software products was $1 per copy sold (or a 1% royalty), after a $100 advance payment. Apparently, Unisys still didn’t know that GIF was based on LZW. In January 1995, Unisys stated:

“Two years ago, Unisys learned that the LZW method was incorporated in the GIF specification and immediately began negotiations with CompuServe in January of 1993. We reached agreement with CompuServe on licensing the technology in June 1994…”

Two years before the Unisys statement, at the end of 1992, an Italian software house contacted Unisys because it was interested in a license for the possible use of LZW in its PostScript Level 2 drivers. That correspondence also mentioned GIF and TIFF as using LZW, and anticipated some of the controversies which would follow 25 months later. Unisys replied: “… You raise a number of interesting issues which require consideration…”

While disclosing the full contents of this correspondence would probably not serve anyone’s interest, the text of two letters sent to Unisys in 1992 is included at the end of this article [edited out of this version], because the author feels that this 1992 perspective could complement the article with a few interesting ideas. The letters have not been edited, so some details (such as the reference to ZIP) may be incomplete with respect to current knowledge.

Unisys offered that software house a $ .25 per unit royalty (1% of the net income) as an alternative to the PostScript one-time license, but did not answer the question raised: “If we implemented a software GIF or TIFF image file loader and saver (both formats are based on the LZW algorithm), would we need a license from Unisys Corp., as far as U.S. Patent 4,558,302 is concerned?”. According to public statements, Unisys did however contact CompuServe the following month.

December 29, 1994 – The Days After

Between 1993 and 1994, the majority of developers still didn’t know that GIF employed a patented algorithm, although both Unisys and CompuServe were aware of this (as the developers would learn in December 1994). Different opinions have been expressed on this. Some developers feel that reaching an agreement behind the scenes was the least destructive thing that could be done. Other (at times passionate) opinions picked up on electronic media are similar to these three:

“Consider this. CompuServe admits to knowing about patent problems with the GIF file format as early as January of 1993. … We added GIF support… months after CompuServe admits knowledge of the patent problem… We relied on the information that was supplied to us by CompuServe. If CompuServe had told us the truth when they knew it, we never would have added GIF support…”

“If I chose to put GIF encode/decode functions in my software development toolkits, my main threat of legal liability would not come from Unisys, but rather from one of my customers being sued by Unisys, who would turn around and sue me for selling them some code that contained patented algorithms.”

“I still don’t have a clue what my situation is if I want to sell source and object code that imports and exports GIF images. I am not in the end-user app business, but my customers are, and they certainly will have to have an LZW license, but what about me? I’ve talked with Unisys by voice and E-mail, and the voice discussion was entirely unsatisfactory as I posted when it happened – basically the Unisys guy said anyone who sells code for $100-$300 a pop was a total _____ for selling it that cheap. The E-mail discussions I’ve had said ‘OK – we hear you – we’ll get back to you.’ Never happened.”

Unisys replied in part with reassuring clarifications to the general public, explaining that if the software was developed prior to 1995, or if it is public domain or freeware, the developer need not to worry:

“… Unisys does not intend to pursue previous inadvertent infringement by versions of GIF-based software products marketed prior to 1995… Unisys does not require licensing, or fees to be paid, for non-commercial, non-profit GIF-based applications, including those for use on the online services… Commercial developers… are expected to secure a licensing agreement with Unisys for software products introduced beginning in 1995, or enhancements of products that were introduced prior to 1995.”

However, these statements were followed by far more restrictive interpretations, both in early 1995 and in the summer of 1996. It soon became clear that Unisys could be demanding royalties for everything “manufactured” after 1994. One developer contacted Unisys and reported:

“I called the Unisys lawyer you referred me to and he confirmed this position. Even a book or CD containing *pre 1995* freeware is subject to royalties if the disk is put together in 1995… Royalties must be collected *again* for each update release.”

While the Unisys licensing policies announced on January 27, 1995 (and later narrowed to more restrictive but clearer definitions) enabled many software publishers to again ship their products after a month-long pause, other developers preferred to abandon GIF already in 1995, waiting for a patent-free evolution of GIF. Comments included:

“What if I sign up and then they announce a new GIF specification which does not use LZW?”

“Labeling and user notification requirements in the agreement are ridiculous. I understand their desire to ‘spread the word’ about their patent, but they’re telling me that I have to provide far more info on their ownership of the patent than they require in the docs/packaging of modem manufacturers and other users of LZW. Fair is fair. A blurb in the online help and docs should be sufficient; a ‘non-defeatable’ splash screen at startup is going too far.”

“Unisys is attempting to control how we (and other shareware authors) do business, and to make us billboards for their LZW patent… By making me tell my users how many security backups they can make, etc., they’re telling me how to run my business and how to interface with my customers.”

“Imagine the nightmare of having to pay royalties to 10 patent holders, each of whom tells you how to run your business…”

“Unisys has given us a chance to work together to change the system – rather than waiting to be sued one by one for this patent or that. We can win the fight against software patents, if we speak loud and clear against them.”

Some of the most active developers decided to collaborate on the design of a patent-free evolution of GIF (and TIFF’s LZW compression mode). A method was quickly found to create uncompressed GIF files without using LZW code, while remaining compatible with existing GIF loaders. Also, a variety of different procedures and data structures (such as Shannon-Fano and AVL trees) have been used to compress data in ways similar, if not always equivalent, to LZW. But a diversity in procedures and data structures alone apparently does not escape the patent. As one expert said, “If the output data is [compressed] GIF, the compressor infringes the Unisys patent regardless of the algorithm.”

On January 16, 1995, CompuServe declared its intention to coordinate the development of GIF24, a freely usable successor to GIF capable of 24-bit lossless compression. Several developers invested a lot of time and energies to solve the Unisys patent problem, and rapidly worked out different modifications to the GIF specification. One of the better known efforts was the project for a “GEF” graphics-exchange format. GEF and GIF24 converged into PNG (official abbreviation of “Portable Network Graphics”, unofficially “Png is Not Gif”).

The open architecture of PNG preserves the simplicity that made GIF so popular, and adds features such as true color. Test results indicate that PNG is capable of (losslessly) compressing true color images better than any other widely used image format. It is also more effective than GIF in storing palette-based images. More information on PNG is included in the Reference and Bibliography sections.

At the end, it appears that if so many efforts converge into a new, improved standard, we still have to give part of the credit to the LZW patent…

Post Scriptum

Possibly in part as a reaction to commercial developers splitting GIF code from their commercial applications and making available the GIF code through free distribution channels, Unisys later clarified that commercial developers in general would be subject to the minimum royalty ($0.10) even on software distributed for free. According to statements reportedly made by Unisys to individual developers, unlicensed freeware GIF software would be allowed only if it came from charitable institutions. Shareware organizations would have to both set a 30-day expiration limit in the (unregistered) program, and also disable the GIF component until the software is registered. Other types of unlicensed “freeware” distribution of GIF and other LZW-based software, regardless of the medium (books, internet, CD-ROM collections, etc.) would not be allowed.

At the end of summer 1996, the references to certain types of freely distributable programs which until then were officially allowed to be distributed without a license had disappeared from the Unisys web site:

“To repeat our statement of January 6, 1995… Unisys also does not require licensing or fees to be paid for non-commercial, not-for-profit GIF-based applications… Shareware developers will not be required to pay license fees until the shareware user accepts the terms and restrictions required by the shareware developer and/or makes payment to the shareware developer or a designated Agent… No license or license fees are required for non-commercial, not-for-profit GIF-based applications or for non-commercial, not-for-profit GIF-freeware, so long as the LZW capability provided is only for GIF…”

The Press Releases archive on the Unisys web site was updated, and the original 1995 item titled “1-10-95 UNISYS CLARIFIES POLICY REGARDING LZW PATENT USE IN ON-LINE SERVICE OFFERINGS” was changed to point to a new public statement, which included:

“More and more people are becoming aware that the reading and/or writing of GIF images requires a license to use Unisys patented Lempel Ziv Welch (LZW) data compression and decompression technology, including United States Patent No. 4,558,302 and foreign counterpart patents. Since January of 1995, Unisys has entered into a large number of license agreements for use of GIF and other LZW-based technology. As a result of both this extensive licensing and changes in the use and marketing of GIF and other LZW-based products, Unisys has had to adjust its licensing policies to reflect these changes and the needs of existing and future licensees.”

In September 1996, the new approach was described by some developers:

“Unisys seems to have scrapped their freeware policy.”

“Anybody who is a shareware author and who is making a program containing LZW available for free (along with other programs which are sold for profit) will have to be targeted by Unisys. The odd thing is that we don’t have any other product and even though we are not exactly a charitable institution we certainly conform to the notion of freeware…”

“When I reminded [Unisys] of the many freeware products that are out there using the GIF format, [Unisys] said that – and this is a quote – ‘just because there are thieves out there doesn’t mean that you can act like a thief’.”

“For the second time in two years we had to change our plans. I am furious. [Three years ago] CompuServe and Unisys knew about the patent, and did not inform the community, leaving me and others waste our time writing this software. Now the same is happening again: I took decisions last year, based on the public ‘clarifications’ by Unisys, and now they are just rewriting history as if they never said those things. This feels like Orwell’s 1984.”

“They should learn from Kodak, and apply to GIF clear, no-nonsense conditions and a simple one-time fee inclusive of a complete developer pack, as [Kodak] does for developers wanting to use the Photo CD format and patents.”

While the World Wide Web Consortium (W3C) officially endorsed the PNG specification as a “W3C Recommendation” and the major graphics packages now support PNG, as of 1998 most internet browsers still could not directly handle PNG. The popularity of GIF even received an unexpected boost when Microsoft, Netscape and others added support for GIF animations to their browsers. As a result, GIF became more difficult to replace with PNG, since PNG was not designed to support animation. The developers of PNG are currently completing a meta-PNG specification providing superior animation features, called MNG. It appears however increasingly likely that before both PNG and MNG will be supported by a majority of web browsers the LZW patent will have expired. At the same time the JPEG 2000 specification, also designed to be “patent free”, is combining technologies such as wavelet compression with support for image types which previously required the use of different file formats such as PNG, MNG, TIFF, JPEG and GIF.

In the second half of 1999 Unisys changed the wording on a LZW information page at its web site, and made available new server licensing options. These changes inspired a new wave of popular interest in the GIF/LZW issue. The substance of the matter however has not changed much since 1995: Unisys is not asking for any licenses in relation to GIF data files, but only for software implementing the LZW algorithm. Most commercial applications that create GIF files already have a GIF/LZW license from Unisys, so users and webmasters creating GIF files with these programs do not need to worry about getting a separate license. As for the renewed proposal to “burn all GIFs” and replace them with a format such as PNG, that is not yet completely possible, because PNG does not support animation, which is widely used with GIF.

As of June 20, 2004, the GIF/LZW-related patents mentioned in this text have expired.

(Any comments or experience you would like to share are always very appreciated.)

Reference

The following sections have been edited out:

  • Excerpts from the PNG specification
  • Official announcements from CompuServe and Unisys
  • Developer correspondence

Bibliography

Adobe Systems Incorporated
“LZWEncode Filter”
PostScript Language Reference Manual, Second Edition
Addison-Wesley Publishing Company
ISBN 0-201-18127-4

Apiki, Steve
“Lossless Data Compression”
Byte, March 1991, pages 309-314, 386-387

Association of Shareware Professionals Forum
CompuServe GO ASPFORUM

Battilana, Michael C.
“LZW Data Compression without Hashing”
University of Udine Exam Project, July 9, 1987

Bell, Timothy C., Cleary, John G. and Witten, Ian H.
“Adaptive Dictionary Encoders”
Text Compression
Prentice Hall
ISBN 0-13-911991-4

Boutell, Thomas (Editor)
PNG (Portable Network Graphics) Specification
http://www.w3.org/pub/WWW/Graphics/PNG

Clay, Betty
“Texas Tales”
ICPUG Newsletter, January/February 1995, pages 18-23

Cloanto IT srl
Supplement to Personal Paint Manual
Version 6.1/1995, January 27, 1995

CompuServe Graphics Developers Forum (GO GRAPHDEV)

CompuServe Graphics Support Forum (GO GRAPHSUP)

Elmer-Dewitt, Philip
“Will Gates Get the Net?”
Time, January 30, 1995, Page 47

Erickson, Jonathan
“Patent Letter Suits” (Editorial)
Dr. Dobb’s Journal, March 1990, page 6

Erickson, Jonathan
“The Green, Green Cash of Gnomes” (Editorial)
Dr. Dobb’s Journal, April 1995, page 6

Gardner, Ray
“LZW Patent Issues” (Letter)
Dr. Dobb’s Journal, December 1989, page 8

Knuth, Donald E.
The Art of Computer Programming
Volume 3 / Sorting and Searching
Addison-Wesley Publishing Company
ISBN 0-201-03803-X

Landy, Gene K.
The Software Developer’s and Marketer’s Legal Companion
Addison-Wesley Publishing Company
ISBN 0-201-62276-9

Miles, J. B.
“Patent Issues May Stall Approval of New V.42bis Modem Standard”
InfoWorld, approximately fall of 1989, pages 43-44
[Author, Article Title and Exact Date Unknown – Information Appreciated]
[Is this the “InfoWorld Article on LZW and Modem Implementations”?]

Nelson, Mark R.
“LZW Data Compression”
Dr. Dobb’s Journal, October 1989, pages 29-36, 86-87

Nelson, Mark R.
“LZW Patent Issues” (Reply to Letter)
Dr. Dobb’s Journal, December 1989, pages 8-12

PNG (Portable Network Graphics)
Information and support material are available online from:

  • Internet http://www.w3.org/pub/WWW/Graphics/PNG/
  • Internet ftp://ftp.cdrom.com/png/
  • Internet http://quest.jpl.nasa.gov/PNG/
  • Internet ftp://swrinde.nde.swri.edu/pub/png-group/documents
  • Internet ftp://godzilli.cs.sunysb.edu/pub/ngf
  • Internet comp.graphics Newsgroups
  • Internet comp.sys.graphics Newsgroup
  • Internet png-list@dworkin.wustl.edu mailing list
  • Internet png-implement@dworkin.wustl.edu mailing list
  • Internet mpng-list@dworkin.wustl.edu mailing list
  • CompuServe Graphics Support Forum (GO GRAPHSUP)

“Spencer the Katt”
PC Week, October 2, 1989

Unisys Corporation
“Patented Algorithms” (Letter)
Dr. Dobb’s Journal, March 1990, page 8

US Patent 4,558,302
The patent text is available from free sources such as Google Patents.

Vaughan-Nichols, Steven J.
“Saving Space”, “Squeeze, Squash, and Crush” and “Legal Seagull”
Byte, March 1990, pages 237-243

Welch, Terry A.
“A technique for high-performance data compression”
IEEE Computer, June 1984, pages 8-19

Ziv, Jacob and Lempel, Abraham
“A universal algorithm for sequential data compression”
IEEE Transactions on Information Theory, May 1977, pages 337-343

Ziv, Jacob and Lempel, Abraham
“Compression of individual sequences via variable-rate coding”
IEEE Transactions on Information Theory, September 1978, pages 530-536

More articles on the GIF issue:

  • Wall Street Journal, January 4, 1995, page B8
  • Computerworld, January 9, 1995, page 6
  • Network World, January 9, 1995, page 4
  • Inter@ctive Week, January 16, 1995, page 7
  • Wall Street Journal, February 9, 1995, page B5

Special thanks to Dave, David, Diana, Frank, Jason, Jean-loup, Jon, Kevin, Larry, Pierce, Richard, Tim, Tom and many others for their precious help.