It has been observed that forcing cache flushes on some systems that do not have split caches, can cause run times to increase noticeably in programs that allocate a great deal (because cache flushing is performed during garbage collection). Run times increase both in the user and system code. For example, running on one particular machine with a non-hw or specialized flush instruction, the basic LL1 benchmark runs in 25.3+4.0 with cache flushing on and in 22.0+0.0 with cache flushing turned off (numbers for illustration only).
The systems where this has so far been observed are:
I am surprised that the cache flush is slow on this one; it may be that the IFLUSH means something special because it is a multiprocessor.