On 05/24/2011 03:46 PM, Matt Sealey wrote:
Hi Gordan,
On Tue, May 24, 2011 at 5:51 AM, Gordan Bobic<gordan(a)bobich.net> wrote:
> Just looking at the specsheet of the Freescale i.MX515, and this jumped
> out at me:
>
> Symmetric/Asymmetric Hashing and Random Accelerator (SAHARA) Lite is a
> cryptographic acceleration engine security co-processor
>
> Implements:
> * Block encryption algorithms (AES, DES, and 3DES)
> * Hashing algorithms (MD5, SHA-1, SHA-224, and SHA-256)
> * Stream cipher algorithm (ARC4)
> * True hardware random number generator (TRNG)
>
> Does anybody know at what kernel version the support for this was added
> (if it has already been added)?
>
> And since I know the Genesi guys read this list, does the Kernel+OpenSSL
> combo that comes with Efika have this enabled as standard? (I lent my
> smartbook to somebody for a few days hence why I'm asking rather than
> just checking - I thought I'd get a head start on trying to get this
> working in the same way as it does on the Kirkwood (SheevaPlug).
>
> I also notice there is this in the i.MX515:
> Security Controller (SCC) type 2
> * AES engine
> * Secure/Non-Secure RAM
> * Support for multiple keys and TZ/non-TZ separation
>
> Does this mean there are two independent AES crypto co-processors in
> there? What about kernel support?
There is kernel support for Freescale's generic test-based "SHW"
interface which is mostly a userspace interface to the kernel, but no
support for anything like cryptodev (since there is no in-kernel
cryptoapi support for Sahara).
We were planning on working on it in the near future. It's important
to us but not top of the list. It would be super useful for ecryptfs
swap and home directories. Freescale said they're going to support
cryptoapi in the kernel with the MX6 and so on and so forth, for
various reasons, and their crypto teams do have working cryptoapi
implementations for Talitos (PowerQUICC/QorIQ) already so it is not as
if they are ignorant of the need for it. It just never got done for
Sahara and we have decided to pick up the slack when we have time.
For some reason I find it really shocking that some manufacturers go to
the efford of designing the hardware and then don't make sure that the
suitable software support is available as soon as the hardware starts
sampling...
Problem: cryptodev is an inefficient API (lots of memory copies) as
would be any crypto exposure from userspace to kernelspace, so the
actual performance even of the Kirkwood engine leaves a lot to be
desired and is far from the hardware performance. This is why Intel
and Via implemented it with CPU instructions - no kernel marshalling
required, you just do it.
Indeed, I am aware of that, but that just makes AES calculations faster,
it doesn't actually offload them. The CPU cannot do other things while
the crypto is crunching in the background. As the figures in the article
I posted show, the big win isn't so much that it runs faster - it does,
but it's a factor or so, not orders of magnitude (on 8KB blocks it was
hitting about 60% of it's rated 300Mb/s - and I'd expect that to go up
with bigger blocks). The big win is that it leaves the CPU idle and free
to do get on with other things. It's more like "free crypto" rather than
"faster crypto". :)
BTW as an alternative to cryptodev why doesn't Fedora make the
leap
like it did with systemd and go for the new netlink crypto interface?
As long as the kernel has a cryptoapi driver and the encryption
software in userspace utilizes the new netlink interface, there's no
need for merging cryptodev or DKMS module compilation messes.
I really don't think dkms-ing cryptodev is that messy or difficult, but
I'll reserve the right to be wrong until I have it working. :)
It's
already mainline in 2.6.39 AFAIR. Someone will need to kick OpenSSL
and any other encryption libraries into shape to get it to work, but I
suspect someone has already done that somewhere.. it would also be a
depdendency-less way to get it into the coreutils binaries.
I like that idea, but considering how long it has taken to get cryptodev
support into OpenSSL and working on Linux, I don't see it happening
overnight - especially since even the bleeding edge Fedora doesn't even
include support for cryptodev as standard. It may be a step too far to
expect soon. Worthy of pushing for, for sure - but it'll take a lot more
work, and we can have cryptodev _now_. Even though it isn't as efficient
as it could be, it's still better than software-only.
It would still suffer from the same memory copy issue as cryptodev
though. The "0% cpu usage" you see is probably not taking into account
the time wasted doing the userspace to kernel part.
As I said elsewhere, with 8KB blocks, the CPU idle time goes to about
70% (it's firmly at 0% when using tiny blocks, and the performance
figures reflect that - it's not faster with small data blocks), and yes,
that is nowhere near the near-0 CPU time OpenSSL reports, but it's still
a non-trivial improvement that's worth having for any realistic
use-case. For example:
1) On ssh running top in a 132x40 xterm, that is at least 5280 bytes per
refresh. top -n1 seems to be 6416 bytes in the dump, presumably due to
escape characters. This compresses down to 1045 bytes, which is
certainly past the point where we start to win on crypto offload.
2) Currently the average size of an "object" on the internet is about
16KB or so, IIRC, and at these size we are way above the curve when it
comes to things like HTTPS SSL offload.
Finally - if the copy overhead is still there, then what is the
advantage? Is there, perhaps a better way of doing it, using DMA, and
just passing a pair of pointers to the coprocessor?
What would make it
clear is if you could run some kind of web benchmark over SSL where
data was being streamed in - if it's faster (objectively running
Javascript or an HTML5 demo or something that deals with heavy network
load which would need to be encrypted and decrypted) then you are onto
a winner, otherwise it may be that it is just hiding the work. How
exactly would we determine whether the security engine is really
making a difference to performance?
Well, I'd say the CPU idle time going from 0% to 70% is a reasonable
start, but yes, I completely agree, something like apachebench over SSL
is probably the way forward. I'll look into it when I have a chance.
Gordan