AES-NI is the new buzzword for fast cryptography. Or is it? This article will show that it improves throughput on some processors, but it does not guarantee that.
AES-NI is used in IPFire to encrypt SSL/TLS and VPN traffic. Despite some other usages these are major ones that also require high throughput if you are connecting to the Internet over a fast connection.
Each block (128 bits of data) that should be encrypted or decrypted is copied into memory. The key is also in memory. An
AESENC (for encryption) or
AESDEC (for decryption) instruction is executed that performs one round of AES. Then, the key is rotated with help of some other instructions and the next round is performed. For AES-128 10 rounds are required. For AES-256 it is 14.
So it is not one instruction that does it all. It is rather a set of instructions that perform different things. The idea behind that is that even if the algorithm is slightly modified (e.g. longer keys or more rounds) the instruction set is still usable. The instructions also used to accelerate other tasks than AES.
The downside of that is that it puts more load on the memory bus and requires more operations. Among those are copies of memory pages into the registers of the processor and back. Of course the operations are complex and take some time.
The faster the processor executes instructions, the faster a single round of AES is. This is usually quite linear with the clock speed as there is no chance to optimise a lot without using too much space on the die. Intel Xeon processors are likely to do that, but for smaller processors this would cost too much energy.
If the connection between the memory and processor is not fast this will delay the execution of an instruction and will cause lower throughput.
In order to find out if AES-NI is worth to have on the must-have list of your next IPFire appliance, we are going to do some benchmarks.
So it is obvious that one has to be careful with the AES-NI label on low-end hardware. A similar processor without AES-NI might be as fast or even faster as one that comes with a poor implementation of AES-NI. So the advice is not only to check if AES-NI exists, but also consult benchmarks if it is actually faster in this class of processors.
Please note that the AES-NI throughput is not equal or roughly equal to the VPN throughput. Unfortunately in most cases this is running on only a single core on which integrity must be computed, too. The lower the performance of this core, the less time for AES-NI is left to sustain a good throughput. A processor with a higher clock speed and higher single core processor is highly advantageous in this situation.