Precursor

Note: The outcomes of these tests, as well as performance and other comparisons of IMAP and web mail systems led us to develop an in-house open source web mail solution: Alphamail.

The project has proved quite successful, and is scaling very well. See the performance data on the AlphaMail home page for more information.

IMAP and POP Tests

Overview

For The Impatient

Here are my general findings so far:

  • UPDATED Jan. 2006: Running benchmarks on Dovecot 1.0-test45. No summarized results available yet.
  • IMPORTANT: Much of this page is severely out of date, and I do not have time to run more tests on the current releases of dovecot, though the procedures are well-documented here. I would appreciate hearing data from others who run these kinds of tests, so I can include/link to your results.

General

This document covers a set of performance tests run against UW-IMAPD/POPD and Dovecot IMAPD. The emphasis was on the amount of I/O that had to be accomplished for the specific task to complete, since CPU times have very little to do with real performance when most of the operations are disk-related. We use the mbox format, so these benchmarks are related to that format. If you are looking for maildir benchmarks, try Google for now...I may add stats on that format later.

The wall time for each benchmark was also recorded, and in general should reflect the cost of the relative I/O (i.e. sequential reads do better than random writes). All benchmarks were run on the same machine, with the same disk drive, and attempts were made to insure that operating system memory caches were not used for the mailboxes by clearing the in-memory caches before each test. I also ran a mock login to the daemon in question, so its being paged back into memory after the chache clear would not be reflected in the measurements.

What I Saw: IMAP (Feb. 2004)

The general results indicate that Dovecot can reduce your disk I/O by an order of magnitude for certain operations; however, the results also show that certain (possibly common) usage patterns can actually cause Dovecot to behave as badly (doing a lot of random writes), or even worse than UW IMAP.

To be fair, the documentation for Dovecot does recommend using a format other than mbox, though UW IMAP manages to do far fewer writes for a set of message deletes than Dovecot, which has to manage indexes (that must be accessed with random I/O).

Many of our users use web mail, which is served by IMHO. I tried to do a few IMHO-related tests to see what kind of improvement Dovecot would bring to general IMHO users. The results were not as good as I'd hoped, but were still worth looking at. Dovecot reduces disk reads by a factor of 2 to 20, but has more impact in terms of writes.

Some operations, such as deleting a set of messages from the middle of the mailbox were actually a bit more costly than UW, with nearly equal reads, but nearly twice as many writes.

Dovecot really shines for users who ask "is there new mail" often using a SELECT request (dovecot supports POP, too). The cost of such a request when there is no new mail is 100 times less for an average-sized INBOX (6MB) than that of UW.

Proper IMAP clients are not nearly as hard-hitting as web clients; however, they all do a "SELECT INBOX" when you initially start them up, and they use FETCH to retrieve messages. The initial SELECT on dovecot is only a real win when no new mail has arrived; however, once that SELECT has been done then the FETCH commands beat UW IMAP by up to TWO orders of magnitude. There is a catch, when I reduced the items that were cached (which improved some items) FETCH did not behave nearly as well.

The mixed request tests (like the one for IMHO) should be given more weight, because they are more indicative of real loads than the specialized tests.

What I Saw: POP

What I saw indicated that Dovecot should not be used as a POP daemon for accounts unless your users leave lots of messages on the server, check every few minutes for new mail, and rarely get that new mail. If you have a lot of users that have small mail boxes, then Dovecot is a huge loss.

The overhead of the indexes buys almost nothing. I would suggest that the author consider rewriting the POP server to simply keep track of the time stamp and location of last message served so that queries to a box that has no new messages can return immediately. Perhaps using a cutoff size of 2-3MB before it ever considers actually fooling with any kind of indexes would help, but in the current state, I would not recommend Dovecot POP for mbox format systems.

The POP results were bad enough on initial inspection that I didn't bother writing futher test scripts because I disqualified Dovecot from our use, and stopped using my time for evaluating it. That is the reason there are not wall times or links to scripts.

The recorded performace data (along with links to the scripts) are shown below.

Performance Measurement and Data

Procedures

To insure that the numbers have a minimum amount of systematic error, the tests were run on an otherwise idle Ultra 10 running Solaris 8. To reduce the possibility that I/O would be "missed" due to caching, a C program was run before each test that tried to memory map a very large file into memory, and touch each page. This has the effect of forcing the VM system to throw out any non-necessary pages.

After clearing the cache, the test script was run once on a extraneous user in order to make sure the IMAP daemon itself was paged into memory for the operations, so that I/O due to running the program itself was not recorded in the stats.

I/O Statistics were pulled using iostat, and were tallied by a Perl script. The Perl script ignored the first sample or two (which is usually errant), and the actual tests were not begun until the script reported that it had started recording.

The tests were run from scripts on a remote host using standard IMAP and POP protocols (port 143/110). The tests represent common operations done by IMAP clients. The tests were timed (wall time), and the I/O statistics were recorded on the IMAP server which was otherwise idle.

Each test was run via a Perl script which went through a sequence of users with identical mailbox content so that the statistics for a single user could be obtained by averaging. Standard deviations on time measurements are shown in many of the results. I improved this script over time, and SDs appear on all of the stats I recorded after this modification.

The results are tabulated below. The tests were run in the sequence shown. Some tests were run more than once to see if the tests previous to them changed their performance (i.e. how does a delete affect the next select?)

UPDATED Test (Dovecot 1.0-test45)

  • Mailbox size: 6641498 bytes
  • Number of messages: 669
  • Number of Users: 20
  • IMAP Server: Dovecot 1.0-test45
  • Configured to cache: MessagePart
  • Mailbox format: Single file (mbox) in user's home directory
  • Locking: dotlock fcntl
TBD
Test Avg Wall Time Avg Reads/User (KB) Avg Writes/User (KB) Notes
SELECT INBOX on "virgin" account. 0.9 +- .15 s 4200 400 Much better numbers than the previous version tested. In fact, somehow Dovecot seems to be reading less than the actual size of the INBOX. I assume the author is using a seeking algorithm based on content sizes in message headers.
SELECT INBOX on previously accessed account 0.13 +- .06 s 38 76 No real change. Very good numbers.
SELECT INBOX on previously accessed account after new mail delivery 0.11 +- .06 s 39k 115k Again, much improved numbers. The new version now has sanity checks to keep from re-reading the whole thing.
"Next Page" and "View"



Fetch (Flags/structure, header, then body) 0.24 +- .02 s 56k 148k
Delete 3.0 s 6500 9700 See next row...
Delete 2 2.8 s 6500 6600 I talked to the author and he pointed out that the X-Status headers had to be added on the first delete run, which accounted for the extra writes. So, I ran this second delete to see what would happen. Seems to be better.
SquirrelMail operations after Delete 0.41 +- .04 s 330 400
IMHO general performance test 5.2 +- .4 s 6700 6300 (OUCH) For some reason, we ended up with a huge impact here on writes
IMHO general performance test (after new mail delivery) TBD


OLD Test group 1 (Dovecot 0.99.10.4)

  • Mailbox size: 6641498 bytes
  • Number of messages: 669
  • Number of Users: 40
  • IMAP Server: Dovecot 0.99.10.4
  • Configured to cache: MessagePart Envelope
  • Mailbox format: Single file (mbox) in user's home directory
  • Locking: dotlock fcntl
Test Avg Wall Time Avg Reads/User (KB) Avg Writes/User (KB) Notes
SELECT INBOX on "virgin" account. 5.2 s 9200 3500 This forced Dovecot to create its indexes (which were stored in ~/mail/.imap), which accounts for the high I/O and runtime. This should only happen on initial conversion, and perhaps due to disk-based mail access between IMAP sessions. Note that the writes are likely random, which are exremely costly, as is reflected in the wall time (compare to UW's SELECT, which is a big sequential read).
SELECT INBOX on previously accessed account 0.042 +- .009 s 52 16 This would be the case should a user log out of IMHO, and then back in later when no new messages have arrived.
SELECT INBOX on previously accessed account after new mail delivery 1.07 +- .08 s 4300 910 This would be the case for logging into IMHO after having received some new mail.
"Next Page" and "View" 0.3 s 393 128 This script does several IMAP operations that I found by sniffing the network traffic of SquirrelMail during a next page and then a message view operation. It should be a fairly common set of operations.
Fetch (Flags/structure, header, then body) 0.12 70 83 This test does three fetches. The first is on the flags and structure, the second on the header, and the final on the first body element. The actual message size was about 512 bytes.
Delete 3.3 s 8020 11070 This test flags 10 messages as deleted (10-20 out of 669), and then expunges them.
SquirrelMail operations after Delete 0.5 s 767 669
IMHO general performance test 2.67 +- .23 s 488 458 This benchmark simulates the traffic from a common set of IMHO requests, including viewing a few messages and deleting them. This iteration was run on a mailbox that had not been modified since the last time dovecot had a look at it.
IMHO general performance test (after new mail delivery) 3.53 +- .30 s 4553 1202 This benchmark simulates the traffic from a common set of IMHO requests, including viewing a few messages and deleting them. This reflects the test if dovecot had to muck with the indexes due to new mail delivery.

Test group 2 (UW IMAP)

  • Mailbox size: 6641498 bytes
  • Number of messages: 669
  • Number of Users: 40
  • IMAP Server: UW IMAP (from Pine 4.58 dist)
  • Mailbox format: Single file in user's home directory
  • Locking: dotlock
Test Avg Wall Time Avg Reads/User (KB) Avg Writes/User (KB) Notes
SELECT INBOX 0.65 +- .156 s 6340 54 This only had to be run once, since UW does not do anything that improves performance after the "first" access.
"Next Page" and "View" 1.64 s 11640 126
Fetch (Flags/structure, header, then body) 0.57 6360 54 This test does three fetches. The first is on the flags and structure, the second on the header, and the final on the first body element. The actual message size was about 512 bytes.
Delete 1.44 s 8690 6380
SquirrelMail operations after Delete 1.53 s 11590 118
IMHO general performance test 4 +- .48 s 13559 352 This benchmark simulates the traffic from a common set of IMHO requests, including viewing a few messages and deleting them.

OLD Test group 3 (Dovecot 0.99.10.4) with less caching

  • Mailbox size: 6641498 bytes
  • Number of messages: 669
  • Number of Users: 40
  • IMAP Server: Dovecot 0.99.10.4
  • Configured to cache: MessagePart
  • Mailbox format: Single file (mbox) in user's home directory
  • Locking: dotlock fcntl
Test Avg Wall Time Avg Reads/User (KB) Avg Writes/User (KB) Notes
SELECT INBOX on "virgin" account. 4.42 +-.39 s 6700 2700  
SELECT INBOX on previously accessed account 0.074 +- .021 s 59 17  
SELECT INBOX on previously accessed account after new mail delivery 1.025 +- 0.128 s 3900 410  
"Next Page" and "View" 0.69 +- 0.11 s 3800 250  
Fetch (Flags/structure, header, then body) 0.32 +- 0.12 s 450 570
Fetch (Flags/structure, header, then body)...2nd run 0.70 +- 0.20 s 4100 470  
Fetch (Flags/structure, header, then body) 3rd run 1.25 +- 0.21 s 5100 1490 I noticed that fetch was a bit flaky. If I cleared the cache, I got worse performace on successive iterations with no mofications to the INBOX at all, with increasing I/O....strange.
Fetch (Flags/structure, header, then body) 4th run with no cache clear before running. 0.16 +- 0.05 s 14 640 Strange that it would want to make this many writes...could have been left-over cached data from the last one, so I ran it again with sync in front and got the next set of numbers....the low read number makes sense since I didn't clear the cache.
Fetch (Flags/structure, header, then body) 0.56 +- 0.07 s 1600 550 This test was run immediately after the last fetch, but with a sync right before it. I expected much better numbers, but they actually got worse. No idea why yet.
Delete 2.39 +- 0.62 s 6600 9900  
Delete (2nd run) 2.26 +- 0.54 s 8200 11000 This second run looked more like the earlier run (group 1), and shows that my I/O measurement is either off, or what I ran before the initial delete had more stuff properly indexed...in other words, perhaps running a SELECT or something before the test makes it easier to start the delete process.
Delete 2.10 +- 0.50 s 8100 11000 I ran a select, and then cleared the cache right before this one. It did not seem to affect it. It is possible that my first measurement failed to record all of the I/O, since I now have three separate samples (see group 1) that reflect about the same thing for this test.
IMHO general performance test 2.57 +- 0.20 s 4000 660  
IMHO general performance test (after new mail delivery) 2.57 +- 0.20 s 4200 2100  

Test group 4: POP (Dovecot 0.99.10.4)

  • Mailbox size: See table
  • POP Server: Dovecot 0.99.10.4
  • Configured to cache: MessagePart
  • Mailbox format: Single file (mbox) in user's home directory
  • Locking: dotlock fcntl
Test Sequence Mailbox size Avg Reads/User (KB) Avg Writes/User (KB) Notes
1st FETCH (leave messages on server) 6.4MB 7500 1400 I used Mozilla Mail as the POP client.
2nd FETCH 6.4MB 4300 504  
3rd FETCH (leave messages on server) 6.4MB 1000 232  
4th FETCH (after new mail delivery) 6.4MB 4400 1000  
5th FETCH 6.4MB 1000 0  

Test group 5: POP (UW)

  • Mailbox size: See table
  • POP Server: UW POP 2003.83
  • Mailbox format: Single file (mbox) in user's home directory
  • Locking: dotlock
Test Sequence Mailbox size Avg Reads/User (KB) Avg Writes/User (KB) Notes
1st FETCH (leave messages on server) 6.4MB 6500 3800 I used Mozilla Mail as the POP client. Not sure why I got writes, other than it writing in message IDs...perhaps I missed some writes.
2nd FETCH 6.4MB 6500 312  
3rd FETCH (leave messages on server) 6.4MB 6500 312  
4th FETCH (after new mail delivery) 6.4MB 7500 80  
5th FETCH 6.4MB 6500 40  

Test group 6: POP on small mailbox (Dovecot 0.99.10.4)

  • Mailbox size: See table
  • POP Server: Dovecot 0.99.10.4
  • Configured to cache: MessagePart
  • Mailbox format: Single file (mbox) in user's home directory
  • Locking: dotlock fcntl
Test Sequence Mailbox size Avg Reads/User (KB) Avg Writes/User (KB) Notes
1st FETCH (leave messages on server) 2 KB 800 500 OUCH!
2nd FETCH 2 KB 390 128 OUCH!

Test group 7: POP on small mailbox (UW)

  • Mailbox size: See table
  • POP Server: UW POP 2003.83
  • Mailbox format: Single file (mbox) in user's home directory
  • Locking: dotlock
Test Sequence Mailbox size Avg Reads/User (KB) Avg Writes/User (KB) Notes
1st FETCH (leave messages on server) 2 KB 8 60  
2nd FETCH 2 KB 41 40  

Test group 8: Cyrus

  • Mailbox size: 6641498 bytes
  • Number of messages: 669
  • Number of Users: 40
  • IMAP Server: Cyrus 2.2.3
  • Mailbox format: Cyrus
  • Options:
Test Avg Wall Time Avg Reads/User (KB) Avg Writes/User (KB) Notes
SELECT INBOX 0.065 +-.02 s 56 65  
"Next Page" and "View" 0.49 +-.06 s 880 970  
Fetch (Flags/structure, header, then body) 0.11 +-.03 s 70 70  
Delete 0.39 +-.10 s 780 1130  
IMHO general performance test 2.61 +-.32 s 960 3000  

Notes on Dovecot 1.0-test45

  • It is not at official release yet.
  • I had a problem compiling on Solaris, and had to patch using instructions from author.
  • Once it was compiled cleanly, I did not see further problems.
  • Need to see how well it deals with external mailbox changes.

Notes on Dovecot v0.99.10.4

  • It is not at official release yet (v 0.99). May be unstable.
  • There has been some traffic on the mailing list in the past about certain SPAM causing it to crash when using maildir (we use mbox). No confirmation if this is specific to just the maildir side.
  • There were a couple of mailing list messages about index problems when using mbox. It looked like they got fixed, but I have seen some behavior that I cannot explain in recent tests (see FETCH in set 3).
    It is worth keeping in mind.
  • Please see http://dovecot.procontrol.fi/clients.html. It looks as if Outlook may have an issue or two, though there are workarounds.
  • There have been complaints about making it work with Eudora, though this seems to be a client problem...need to test.
  • The general IMHO tests show inconclusive results. The random I/O is so costly (reflected in the wall time) that I have doubts as to whether it would be a win. The multiple sequential reads of a large mailbox done by UW are pretty quick compared to just a few random writes.

Notes on IMHO

I noticed some strange hangs when I first started using IMHO with Dovecot, where IMHO would just hang on login and never get anywhere.

I watched syslog for a bit, and found that IMHO does not log you out of the IMAP daemon when you log out...the Roxen IMAP timeout is what finally logs you out. So, my hang was hapenning because I was switching IMAP servers on Roxen, and it was trying to play with the old server through what it thought were still existing connections.

If I restart Roxen when I change IMAP daemons, everything seems to work properly.

Future Directions

I would like to add some more tests and daemons to this list. I also need to find (or write) a disk I/O measurement tool for Solaris that will allow me to judge the relative costs of sequential reads over the random writes so that I can give a "score" to a benchmark in addition to the "wall time".