[fedora-arm] Hardware Crypto Offload on i.MX515 (Efika)

Tue May 24 15:29:03 UTC 2011

On 05/24/2011 03:46 PM, Matt Sealey wrote:
> Hi Gordan,
>
>
>
> On Tue, May 24, 2011 at 5:51 AM, Gordan Bobic<gordan at bobich.net>  wrote:
>> Just looking at the specsheet of the Freescale i.MX515, and this jumped
>> out at me:
>>
>> Symmetric/Asymmetric Hashing and Random Accelerator (SAHARA) Lite is a
>> cryptographic acceleration engine security co-processor
>>
>> Implements:
>>      * Block encryption algorithms (AES, DES, and 3DES)
>>      * Hashing algorithms (MD5, SHA-1, SHA-224, and SHA-256)
>>      * Stream cipher algorithm (ARC4)
>>      * True hardware random number generator (TRNG)
>>
>> Does anybody know at what kernel version the support for this was added
>> (if it has already been added)?
>>
>> And since I know the Genesi guys read this list, does the Kernel+OpenSSL
>> combo that comes with Efika have this enabled as standard? (I lent my
>> smartbook to somebody for a few days hence why I'm asking rather than
>> just checking - I thought I'd get a head start on trying to get this
>> working in the same way as it does on the Kirkwood (SheevaPlug).
>>
>> I also notice there is this in the i.MX515:
>> Security Controller (SCC) type 2
>>      * AES engine
>>      * Secure/Non-Secure RAM
>>      * Support for multiple keys and TZ/non-TZ separation
>>
>> Does this mean there are two independent AES crypto co-processors in
>> there? What about kernel support?
>
> There is kernel support for Freescale's generic test-based "SHW"
> interface which is mostly a userspace interface to the kernel, but no
> support for anything like cryptodev (since there is no in-kernel
> cryptoapi support for Sahara).
>
> We were planning on working on it in the near future. It's important
> to us but not top of the list. It would be super useful for ecryptfs
> swap and home directories. Freescale said they're going to support
> cryptoapi in the kernel with the MX6 and so on and so forth, for
> various reasons, and their crypto teams do have working cryptoapi
> implementations for Talitos (PowerQUICC/QorIQ) already so it is not as
> if they are ignorant of the need for it. It just never got done for
> Sahara and we have decided to pick up the slack when we have time.

For some reason I find it really shocking that some manufacturers go to 
the efford of designing the hardware and then don't make sure that the 
suitable software support is available as soon as the hardware starts 
sampling...

> Problem: cryptodev is an inefficient API (lots of memory copies) as
> would be any crypto exposure from userspace to kernelspace, so the
> actual performance even of the Kirkwood engine leaves a lot to be
> desired and is far from the hardware performance. This is why Intel
> and Via implemented it with CPU instructions - no kernel marshalling
> required, you just do it.

Indeed, I am aware of that, but that just makes AES calculations faster, 
it doesn't actually offload them. The CPU cannot do other things while 
the crypto is crunching in the background. As the figures in the article 
I posted show, the big win isn't so much that it runs faster - it does, 
but it's a factor or so, not orders of magnitude (on 8KB blocks it was 
hitting about 60% of it's rated 300Mb/s - and I'd expect that to go up 
with bigger blocks). The big win is that it leaves the CPU idle and free 
to do get on with other things. It's more like "free crypto" rather than 
"faster crypto". :)

> BTW as an alternative to cryptodev why doesn't Fedora make the leap
> like it did with systemd and go for the new netlink crypto interface?
> As long as the kernel has a cryptoapi driver and the encryption
> software in userspace utilizes the new netlink interface, there's no
> need for merging cryptodev or DKMS module compilation messes.

I really don't think dkms-ing cryptodev is that messy or difficult, but 
I'll reserve the right to be wrong until I have it working. :)

> It's
> already mainline in 2.6.39 AFAIR. Someone will need to kick OpenSSL
> and any other encryption libraries into shape to get it to work, but I
> suspect someone has already done that somewhere.. it would also be a
> depdendency-less way to get it into the coreutils binaries.

I like that idea, but considering how long it has taken to get cryptodev 
support into OpenSSL and working on Linux, I don't see it happening 
overnight - especially since even the bleeding edge Fedora doesn't even 
include support for cryptodev as standard. It may be a step too far to 
expect soon. Worthy of pushing for, for sure - but it'll take a lot more 
work, and we can have cryptodev _now_. Even though it isn't as efficient 
as it could be, it's still better than software-only.

> It would still suffer from the same memory copy issue as cryptodev
> though. The "0% cpu usage" you see is probably not taking into account
> the time wasted doing the userspace to kernel part.

As I said elsewhere, with 8KB blocks, the CPU idle time goes to about 
70% (it's firmly at 0% when using tiny blocks, and the performance 
figures reflect that - it's not faster with small data blocks), and yes, 
that is nowhere near the near-0 CPU time OpenSSL reports, but it's still 
a non-trivial improvement that's worth having for any realistic 
use-case. For example:

1) On ssh running top in a 132x40 xterm, that is at least 5280 bytes per 
refresh. top -n1 seems to be 6416 bytes in the dump, presumably due to 
escape characters. This compresses down to 1045 bytes, which is 
certainly past the point where we start to win on crypto offload.

2) Currently the average size of an "object" on the internet is about 
16KB or so, IIRC, and at these size we are way above the curve when it 
comes to things like HTTPS SSL offload.

Finally - if the copy overhead is still there, then what is the 
advantage? Is there, perhaps a better way of doing it, using DMA, and 
just passing a pair of pointers to the coprocessor?

> What would make it
> clear is if you could run some kind of web benchmark over SSL where
> data was being streamed in - if it's faster (objectively running
> Javascript or an HTML5 demo or something that deals with heavy network
> load which would need to be encrypted and decrypted) then you are onto
> a winner, otherwise it may be that it is just hiding the work. How
> exactly would we determine whether the security engine is really
> making a difference to performance?

Well, I'd say the CPU idle time going from 0% to 70% is a reasonable 
start, but yes, I completely agree, something like apachebench over SSL 
is probably the way forward. I'll look into it when I have a chance.

Gordan