AES New Instructions

AES-NI is the new buzzword for fast cryptography. Or is it? This article will show that it improves throughput on some processors, but it does not guarantee that.

AES-NI is used in IPFire to encrypt SSL/TLS and VPN traffic. Despite some other usages these are major ones that also require high throughput if you are connecting to the Internet over a fast connection.

Keep in mind

  • AES-CBC can only be performed on one core at the same time (AES-GCM can take advantage of multiple cores but is less commonly used)
  • Encryption and decryption takes exactly the same number of operations
  • VPNs do not only perform encryption, they also compute integrity (explained over here

How does AES-NI work?

Each block (128 bits of data) that should be encrypted or decrypted is copied into memory. The key is also in memory. An AESENC (for encryption) or AESDEC (for decryption) instruction is executed that performs one round of AES. Then, the key is rotated with help of some other instructions and the next round is performed. For AES-128 10 rounds are required. For AES-256 it is 14.

So it is not one instruction that does it all. It is rather a set of instructions that perform different things. The idea behind that is that even if the algorithm is slightly modified (e.g. longer keys or more rounds) the instruction set is still usable. The instructions also used to accelerate other tasks than AES.

Performance considerations

The downside of that is that it puts more load on the memory bus and requires more operations. Among those are copies of memory pages into the registers of the processor and back. Of course the operations are complex and take some time.

Conclusion #1: The performance of AES-NI is highly dependant on clock speed

The faster the processor executes instructions, the faster a single round of AES is. This is usually quite linear with the clock speed as there is no chance to optimise a lot without using too much space on the die. Intel Xeon processors are likely to do that, but for smaller processors this would cost too much energy.

Conclusion #2: The performance of AES-NI is dependant on the bus speed

If the connection between the memory and processor is not fast this will delay the execution of an instruction and will cause lower throughput.


In order to find out if AES-NI is worth to have on the must-have list of your next IPFire appliance, we are going to do some benchmarks.

Hardware Processor Clock Speed AES-NI AES throughput
Embedded Systems/SoCs
Duo Intel Haswell N2957U 1.4 GHz No ~116 MByte/s
APU2B4/APU2C4 AMD GX-412TC 1.0 GHz Yes ~120 MByte/s
Desktop Processors
Intel Core i5 760 2.8 GHz Yes ~260 MBytes/s (~86 MByte/s w/o AES-NI)


  • AES-NI makes operations faster on the same processor. We have not seen any processor that is slower with AES-NI or where the improvement is negligible on the same processor.
  • There are low-end processors (or many SoCs) that are implementing AES-NI only in microcode but the memory bus is the bottleneck.

So it is obvious that one has to be careful with the AES-NI label on low-end hardware. A similar processor without AES-NI might be as fast or even faster as one that comes with a poor implementation of AES-NI. So the advice is not only to check if AES-NI exists, but also consult benchmarks if it is actually faster in this class of processors.

Side notes

AES-NI is not VPN throughput

Please note that the AES-NI throughput is not equal or roughly equal to the VPN throughput. Unfortunately in most cases this is running on only a single core on which integrity must be computed, too. The lower the performance of this core, the less time for AES-NI is left to sustain a good throughput. A processor with a higher clock speed and higher single core processor is highly advantageous in this situation.

Further reading

FIXME - We need to have benchmarks about the results to the real VPN throughput (IPsec and OVPN)

Edit Page ‐ Yes, you can edit!

Older Revisions • August 24 at 8:56 pm • Jon