Retire a package from Fedora i686 (not x86_64)

Germano Massullo germano.massullo at gmail.com
Sat Nov 7 09:19:39 UTC 2015


A few lines of IRC chat, Freenode #darktable. Hanatos is Darktable
project founder.


[09:17] <hanatos_> ``requiring SSE3 is not really allowed ''
[09:17] <hanatos_> so much bundled cluelessness :/
[09:19] <hanatos_> Germano: re 32-bit
[09:19] <hanatos_> the sse thing is one thing
[09:20] <hanatos_> the other is the very limited virtual address space
(2G really)
[09:20] <hanatos_> everybody coding anything half way serious will tell
you the same story
[09:20] <hanatos_> (rawtherapee has the same issues iirc)
[09:20] <hanatos_> our old cache was allocing one big chunk of memory at
startup and maintained it manually
[09:20] <hanatos_> essentially duplicating a poor man's malloc,
specialised for our thumbnail caches
[09:21] <hanatos_> the new cache is much faster and easier to read
[09:21] <hanatos_> but based on malloc/free
[09:21] <hanatos_> which means your virtual address space (not the
physical one mapping to your ram)
[09:21] <hanatos_> will get fragmented and you quickly start addressing
blocks above the 10G range
[09:22] <hanatos_> which may not be a problem, even on systems with only
2G of physical ram, because blocks have been freed in between. it's just
on 32-bit systems you can't address it any more and die
[09:22] <boucman> basically, at this point, DT makes no sense on x86,
except maybe dt-cli
[09:22] <hanatos_> which is a similar argument as the sse3 is.
[09:22] <hanatos_> it's just not a worthwhile experience running this
software on this kind of hardware
[09:22] <hanatos_> boucman: yes, that.
[09:23] <Artefact2> hanatos_: I think jmalloc is also more clever than
glibc malloc wrt fragmentation. that's why blender uses it, afaik
[09:24] <Artefact2> *je
[09:24] <hanatos_> Germano: so i'd like to contradict the `upstream
doesn't care' bit
[09:25] <hanatos_> upstream does care.
[09:25] <hanatos_> just not about random principles and guidelines
[09:25] <hanatos_> but about how well darktable runs
[09:25] <hanatos_> Artefact2: jemalloc you mean
[09:25] <Artefact2> hanatos_: yes
[09:25] <hanatos_> yes, it's mostly multithreaded/
[09:25] <hanatos_> block per thread
[09:25] <hanatos_> might be worthwhile when running many threads for
thumbnail gen
[09:25] <hanatos_> but honestly i doubt it
[09:26] <hanatos_> it speeds up another piece of code i wrote
[09:26] <hanatos_> which uses many 10s of 1000s of malloc calls per second..
[09:26] <hanatos_> we don't do that in dt
[09:26] <hanatos_> (or tcmalloc for that matter)
[09:26] <hanatos_> simple enough to try with an LD_PRELOAD
[09:26] <Artefact2> oh yeah. calling malloc this many times is a bad
idea anyway
[09:32] <hanatos_> the alternative would have been allocate ridiculous
amounts of memory up front
[09:32] <hanatos_> bad idea, too
[09:32] <hanatos_> but if you have a better solution i'd sure like to
hear it :)
[09:32] <hanatos_> the problem is to construct a binary search tree
[09:32] <Artefact2> i'm not a memory guru, sadly :|
[09:32] <hanatos_> in parallel
[09:32] <hanatos_> so you start at the root and push the children as new
jobs (malloc job_t)
[09:32] <hanatos_> and so on
[09:33] <hanatos_> it's millions of nodes total, so you don't want to
allocate them up front
[09:33] <Artefact2> maybe a compromise. allocate a pool that can store,
say 10 jobs at a time
[09:34] <hanatos_> (and yes, i would agree.. calling malloc is almost
always a bad idea, unless you can't avoid it)
[09:34] <hanatos_> but see.. that pool per thread.. that's exactly what
jemalloc/tcmalloc do
[09:34] <Artefact2> this way you reduce the allocator load by a factor
of 10, while still not allocating huge amounts of contiguous memory
[09:35] <Artefact2> maybe the issue is elsewhere. what are you doing
millions of? is it possible to make "bigger" jobs and have less of them?
ie a smaller tree
[09:38] <hanatos_> nope, can't touch the tree
[09:38] <hanatos_> its some spatial acceleration structure for ray tracing
[09:39] <hanatos_> it's been optimised for fast ray tracing for many years
[09:39] <Artefact2> are we still talking about darktable? didn't know it
needed a raytracer
[09:40] <hanatos_> no, different piece of code.. as i said, i don't
think darktable needs thread-cached malloc
[10:11] <hanatos_> Germano: also feel free to refer those guys here to
us if they have questions. seems to me that some direct contact may be
better.


More information about the devel mailing list