Why does disk I/O slow down a CPU bound task?

Mon Mar 30 19:58:56 UTC 2015

I noticed on RHEL 6 that when a large amount of disk I/O is happening that
CPU bound tasks "slow down". I have been able to reproduce it in Fedora 21
as well and here are the instructions of how I can reproduce it with a
simple test:

1) Build the disk_test.cc (the "CPU bound task") and run it.
2) Create a large file to copy ( fallocate -l 10G junk ).
3) Copy that file with a one minute delay between copies ( while true; do
cp junk junk2; sleep 60; done )

If you direct the output of disk_test.cc to a file, then you can plot the
results in gnuplot with the following commands to see the change in the
mean time between "finishing the work cycle" when the file is being copied:
set xdata time
set timefmt "%s"
plot "out.txt" using 1:3 with lines

You can also notice that the load average is also going up, so it seems
like something in the kernel/scheduler is getting some sort of exclusive
lock in the disk I/O process and that's causing the CPU bound task to not
be able to execute when it should. Any ideas?

Thanks,
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/devel/attachments/20150330/c81c7ace/attachment.html>
-------------- next part --------------
#include <iostream>
#include <math.h>
#include <time.h>

long do_work(long pseed)
{
  // Just do a bunch of computations to use up CPU
	for (int bnum=0; bnum<300000; ++bnum)
		pseed = pseed * 1103515245 + 12345;
	return pseed;
}

int main()
{
  // Track the time between "work cycles"
	timespec t, lastT;
  clock_gettime(CLOCK_REALTIME, &lastT);
  // And the next time an output should happen
	time_t nextOutputT = lastT.tv_sec + 10;

	int n = 0;
	double mean = 0;
	double m2 = 0;
	int sum = 0;
  // Loop for a long time instead of infinitely so the compiler won't optimize away anything
	while (n < 1000000000) {
    // Do some work
		sum += do_work(lastT.tv_nsec);

    // Get the current time
		int retVal = clock_gettime(CLOCK_REALTIME, &t);
		if (retVal == 0) {
      // And calculate the statistics of the time between "work cycles"
			long dT = (t.tv_sec - lastT.tv_sec) * 1000000000 + (t.tv_nsec - lastT.tv_nsec);
			++n;
			double delta = dT - mean;
			mean += delta / n;
			m2 += delta * (dT - mean);
		} else {
			std::cerr << "Error getting time: " << retVal << std::endl;
      return -1;
		}

    // If it's time to output the statistics
		if (t.tv_sec >= nextOutputT) {
      // Then output them
			if (n > 1)
				m2 = sqrt(m2 / (n - 1));
			std::cout << t.tv_sec << ' ' << n << ' ' << mean << ' ' << m2 << std::endl;

      // Rest the statisitics
			n = 0;
			mean = 0;
			m2 = 0;

      // And record the next time an output should happen
			nextOutputT = t.tv_sec + 10;
		}

    // Save the current time for calculating time between "work cycles"
		lastT = t;
	}

  // Return the value so the compiler won't optimize away anything
	return sum;
}