Hello All,
I’m in the process of completing an all new program for use with Microsoft Outlook 2002 and Outlook 2000 called WOPR Junk Mail Remover. As it’s name implies, the program is an enhanced junk mail filtering program that is specifically for use with Microsoft Outlook 2002/2000.
The program uses Bayesian statistics incorporated into some fairly simple algorithms to generate what is called a “Bayesian” or “Content-Based” e-mail filter. As new e-mail arrives in your Inbox, it is broken down into tokens (i.e. all of its individual words), including any HTML tags, RTF tags, embedded Java Script, and Internet Headers, and then all of those tokens/words are compared statistically against a corpus/dictionary of known “good” (i.e. “Non-Spam”) words and known “bad” (i.e. “Spam”) words. The combined probability of the most interesting “good” and “bad” words in the e-mail message is then used to determine if the newly arrived e-mail is junk mail or not. The filtering process works so well that it’s almost scary!
Since the Bayesian Filter is customized to each user’s individual e-mail habits, it must be trained (or taught) to know what e-mails are Non-Spam and which are Spam on each user’s system. Thus, when you first install the program the Bayesian Filter will need to be trained by selecting a “Mark Message as Junk Mail” or “Mark Message as Non-Junk Mail” toolbar button when new e-mail arrives. After you get about five or six messages trained/marked, the program will start to take off and it will begin filtering your newly arrived mail. However, the more you train the filter, the smarter it gets. I suggest a minimum of 50 Spam and Non-Spam trainings to start with, but even more than that would be even better (I currently have around 300 of each trained and sitting in my corpus/dictionary).
To backup the Bayesian Filter, there is a complete set of secondary filters that can be used to filter an incoming e-mail message based on all of the normal items (actually the program was originally written around these filters and I only recently added in the Bayesian filter. However, since the Bayesian Filter worked so well it quickly became the focal point of the program). For example, you can define custom filters (using the included “Create New Filter Wizard” tool) to filter incoming message’s based on the sender’s user name, the sender’s domain name, the message’s subject content, the message’s text/body content, the message’s Internet Header fields, the message’s country of origin, etc. All of the standard (i.e. “built-in”) and “user-defined” filters are managed via the programs Options dialog box (which is accessed via a button on the program’s main toolbar).
The program also allows you to define a white list or a “Friends” list where anyone on that list automatically bypasses all of the filters so you are assured of getting their e-mail without it getting trapped by one of the filters (this also has the advantage of speeding up the program since mail from friends doesn’t have to be filtered). The program alerts you when new e-mail arrives and keeps a running total of the numbers of mails that have arrived (i.e. accepted e-mail, e-mail from friends, possible junk mail, and confirmed junk mail). The e-mail is flagged in Outlook (using standard red and white Outlook flags) with the reason why it was filtered and it’s mail status icon is changed accordingly so that you can easily identify filtered mail. Confirmed junk mail is automatically moved into a “Junk Mail” folder and the folder can be setup to automatically purge its contents after a set number of days have past since the message was first received/filtered (and the Junk Mail folder’s location can be changed as well).
There’s event a neat little “Message Details” tool that lets you view the plain text content of the message along with all of its Internet Headers. The tool even has an option for showing you a “Word Analysis” of the message where the individual tokens/words in the message are colorized based on their probability of being Spam or not. That way you can get a bird’s eye view of how the Bayesian filter sees the message. I personally hate opening Spam because 99% of it is written in HTML format and Outlook always tries to access the Internet in order to download the graphics for the HTML messages. This little tool stops that from happening as all you see is the plain text of the message with all of the HTML stripped out (which is great for looking at those messages that you don’t really know if they are Spam or not).
The program has another useful little tool that sends an “Unknown User” error message back to the originator of the junk mail message informing the Spammer that the e-mail address they have sent their Spam to is invalid (even though it really isn’t). That way, the Spammer will think that your e-mail address is invalid and will hopefully remove you from their list. A standard error message is provided, but it can be fully customized, and the default action of any filter can be setup to automatically send the error message to the sender or you can do it manually form the program’s main toolbar at any time.
If all of this sounds like a lot, it is… The program is quite a piece of work in my opinion and I wouldn’t want to be without it. It’s very addicting…
Anyway, as I’ve already mentioned, the program will run under Windows 98 and Outlook 2000, but it really shines when you install it on a system running Windows Me, Windows 2000, or Windows XP (Home Edition or Pro) and Outlook 2002. The reason for this is that the program does some pretty fancy API work to display information messages via the Windows System Tray and that feature only works on systems with version 5.0 or greater of the Windows Shell Library (i.e. any of Microsoft’s O.S.’s greater than Windows 98 and NT 4.0). It also looks a lot better if you are running more than 256 colors for your screen display (although it has been tested and will run just fine on 16 color displays and Shell versions less than 5.0. In those cases, the info messages are displayed via the Office Assistant instead.).
The program includes an Installer and an Uninstall so installation should be a total breeze (just follow the on-screen prompts). After you’ve installed the program, and you run Outlook for the first time, the program will build it’s backend database and then prompt you to create an account (since there’s no programmatic way of getting the account info from Outlook itself). The account info is simply used by the program to determine your e-mail addresses (for filtering purposes) and to send the “Unknown User” error messages that I told you about above. The “Unknown User” Error Message feature requires that you have a POP3 or IMAP account in order to deliver the error message anonymously (again, since Outlook itself doesn’t allow such things). Once you’ve set up your accounts, the program will prompt you to import your address books into your “Friends” list. After that, you are ready to begin training the program (which is done by selecting one or more messages and clicking on the “Mark Message as Junk Mail” or “Mark Message as Non-Junk Mail” buttons on the programs main toolbar.) Once you’ve trained it enough, just sit back and watch it filter all of that junk mail. It’s absolutely amazing!
I’ve tried real hard to keep the filtering process as fast as possible, but as you might expect, filtering a large number of messages can take some time (since each incoming message must be broken down into its individual words and then compared against the “good” and “bad” word dictionaries, etc.). Thus, it works best if you let Outlook continually grab your mail every so often.
Anyway, after many months of hard work I think that I’m finally at a stage where I could use some other folks looking at the new “WOPR Junk Mail Remover” program for me. While it still has some minor flaws and still needs a few more loose ends tied up, all in all, I feel that it is working quite spectacularly. In the last week alone, it has removed over 1,000 pieces of junk mail from my Inbox with less than 1% false positives (and the only reason I think I’m getting the false positives is because I don’t receive enough “good” e-mail to train the filter as well as I can with all of the “bad” e-mail I get. Once I can get an even balance of “good” and “bad” e-mail trained, I think it will be near flawless in its filtering).
Thus, I’m looking for a very limited number of beta testers who would be interested in helping me test out the new program and get it ready for final release. I’m only looking for around 10 or so serious testers who would have the time available to test the program over the next two weeks and would be willing to provide me with some useful feedback (I’m not only interested in finding bugs, but I’d like to know how you feel about the user interface, if there is anything I can do to make the program easier to use/understand, etc.). If you are interested in testing the program, please drop me a line at beta@wopr.com letting me know your current operating system (including any installed service packs), your current version of Office/Outlook (including any installed service releases), and what mode you are running Outlook in if you are running Outlook 2000 (i.e. Internet Mail Only (IMO) mode or Corporate or Workgroup (C/W) mode). Please remember that I can’t except everyone as the available testing slots are very limited.
I’m looking for one or two of the following testers:
1. Someone running Outlook 2002/2000 on Microsoft Exchange Server (under any O.S.).
2. Someone running Outlook 2000 in Corporate or Workgroup mode (under any O.S.).
3. Someone running Outlook 2000/2002 under Windows 98.
4. Someone running Outlook 2000/2002 under Windows 98 SE.
5. Someone running Outlook 2000/2002 under Windows XP Home Edition
6. Someone running Outlook 2000/2002 under Windows XP Home Edition SP1
7. Someone running Outlook 2000/2002 under Windows XP Pro
8. Someone running Outlook 2000/2002 under Windows XP Pro SP1
9. Someone running Outlook 2002 SP2 (under any O.S.).
10. Someone running Outlook 2000 with WordMail enabled (under any O.S.).
11. Someone running Outlook 2002/2000 with a dial-up connection (under any O.S.)
12. I’m also looking for feedback on how the program looks at a variety of screen colors and screen resolutions, so if you don’t mind flipping your system between 16, 256, high color, and true color, or between 640×480, 800×600, and/or 1024×768 or higher screen resolutions, then you’d make a good testing candidate as well.
Thanks for your interest in the new program and I look forward to hearing your comments…