[HEADS-UP] Rawhide: /tmp is now on tmpfs

Wed Jun 20 20:57:49 UTC 2012

On 06/20/2012 02:16 PM, Gregory Maxwell wrote:
> On Wed, Jun 20, 2012 at 1:54 PM, Jef Spaleta <jspaleta at gmail.com> wrote:
>> On Wed, Jun 20, 2012 at 9:41 AM, Gregory Maxwell <gmaxwell at gmail.com> wrote:
>>> Tmpfs volumes have a size set as a mount option. The default is half
>>> the physical ram (not physical ram plus swap). You can change the size
>>> with a remount. When its full, its full, like any other filesystem
>> Okay that was what I was missing. Pegging the tmpfs to some percentage
>> of available ram by default.
>>
>> Follow up question.. is this value defined at install time or is it
>> 50% of ram as seen at boot up?
>> If I add or remove ram between boot ups, post-install does the tmpfs
>> size automatically adjust to 50% of what is available at boot up?
> It's 50% available at bootup by default (e.g. this is what you get
> when you provide no size option while mounting). I'm not sure what the
> systemd stuff is doing, I had the impression it was leaving this as a
> default.   I don't know if this is a good thing or not.
>
> On Wed, Jun 20, 2012 at 1:56 PM, Brian Wheeler <bdwheele at indiana.edu> wrote:
>> I don't think its just a matter of quantity of I/O but _when_ the I/O
>> happens.  Instead of the pagecache getting flushed to disk when it is
>> convenient for the system (presumably during a lull in I/O) the I/O is
>> concentrated when there is a large change in the VM allocations -- which
>> makes it very similar to a thrashing situation.
>>
>> With a real filesystem behind it, the pages can just be discarded and reused
>> when needed (providing they've been flushed) but in the case of tmpfs the
>> pages only get flushed to swap when there is memory pressure.
> An anticdote is not data, but I've never personally experienced
> negative "thrashing" behavior from high tmpfs usage.  I suppose
> thrashing only really happens when there is latency sensitive
> competition for the IO, and the kernel must be aggressive enough to
> avoid that.

I was pretty sure that on the internet an anecdote == data. :)

> When data is written to file systems normally the bulk will also
> remain in the buffer cache for a some span of time until there is
> memory pressure.  The difference is how long it can remain (tmpfs has
> no mandatory flush) before being backed by disk, how much extra
> overhead there is from maintaining metadata (less for tmpfs than
> persistent file systems), and how much must be written right away to
> keep the fs consistent (none for tmpfs).

Perhaps, but if you've dumped a big file to /tmp on a real filesystem 
and then a minute or two later you startup something large, its probable 
that the kernel has flushed the data to disk and the pagecache has 
easily discardable pages to use for new data coming in.  Under tmpfs the 
flush would be forced on page discard which would also be when things 
were being read into the system.

But in any case the I/O advantages have never been shown, despite 
multiple requests by myself and others.

> On Wed, Jun 20, 2012 at 2:06 PM, Brian Wheeler <bdwheele at indiana.edu> wrote:
>> So the default is that I can use 2G in /tmp regardless of how much swap is
>> present if the system memory size is 4G?  So the only way to get more /tmp
>> is to either mess with the max% or buy more ram?
> On systems where tmpfs is provisioned for /tmp in fstab you change a
> setting to get more space (provide size=fooG mount option).  This is
> easier than adding more space to tmp when tmp is on root or some other
> file system.
Well, yes and no. You also have to make sure you have enough backing 
swap or you're screwing yourself out of usable ram.  The problem here is 
that the amount of /tmp by default is small by default so the tinkering 
with sizes is actually more likely to be required that it was before.  
And moving the requirement for "large files" (for some value of 'large' 
which depends on your memory configuration) to /var/tmp is just moving 
goalposts and not actually solving anything.

>
> I don't know how it will be set in systemd. Regardless of what systemd
> offers you could still toss in an option to remount it with more space
> after bootup.
>
> Buying more ram to increase /tmp is silly of course.  The default
> behavior is just a default it doesn't imply some kind of cosmic
> relationship between your tmpfs size and the amount of physical ram.

Ah, but it is the default.  Because of this, there are going to be 
dumbass howto sites out there saying that Fedora is broken because it 
requires you to buy more RAM to get increased swap space -- no matter 
how many times it is refuted here.

So I built a rawhide vm just now with 2G of ram and while it didn't move 
/tmp to tmpfs (maybe because it was an upgrade?), /run is in tmpfs and I 
did some experiments.  Yes, it did limit me to 1G when writing a file 
which is fine -- except that as a user that is substantially smaller 
than the (disk size - 6G) size that one would have had on /tmp if it was 
on /.  Which means that many users are going to have to mess with that 
setting in order to preserve their current workflow and/or solve goofy 
bugs.  And they're going to have to do it in a way that doesn't screw up 
their machines because they set it higher than they have backing storage.

It also means that every byte living in /tmp is a byte that cannot be 
used for cache or programs (except for caching the stuff in /tmp), so 
we're less memory efficient than before and swap/ram sizes will have to 
be larger to do what was done before.

I know that we've been told that this is a done deal and that everyone 
should just get over it, but this is a feature that I think truly sucks 
for a lot of reasons and there hasn't been any _actual_ benefits that 
have been proven for it, just lots of hand waving and anecdotes about 
how it works.

<sarcasm>
Maybe for F19 I'll submit a feature that requires all X apps have to use 
8-bit color (oooh, and private colormaps) since its will make network 
rendering 3x faster and that what solaris used to do!  Don't ask any 
questions, though, because you can't possibly understand and I know it 
just works for me.
</sarcasm>