Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a private key provider to accelerate RSA and ECDSA crypto operations on recent Intel Xeon processors. #15871

Closed
ipuustin opened this issue Apr 7, 2021 · 9 comments · Fixed by #17826
Labels
area/tls design proposal Needs design doc/proposal before implementation stale stalebot believes this issue/PR has not been touched recently

Comments

@ipuustin
Copy link
Member

ipuustin commented Apr 7, 2021

Title: Add a private key provider to accelerate RSA and ECDSA crypto operations on recent Intel Xeon processors.

Description:

Intel's IPP (Integrated Performance Primitives) crypto library has support for multi-buffer crypto operations. Briefly, multi-buffer cryptography is implemented with AVX-512 instructions using a SIMD (single instruction, multiple data) mechanism. Up to eight RSA or ECDSA operations are gathered together into a buffer and processed with a single instruction, providing potentially improved performance. The AVX-512 instructions are available on recently launched Intel Ice Lake Server processors.

We have an Envoy private key provider in the works which we would like to submit as a PR soon. There are a few opens which we would like to get feedback on:

  1. Testing. The functionality can be tested only on the Xeon processors which support the instruction set. We can test this by adding tests which are only run on processors which have the required AVX512 instructions, or alternatively we can mock the library interface and just simulate running the crypto operations in parallel.
  2. External dependency. One extra external dependency (ipp-crypto library) is added to Envoy build. The library itself is pretty straightforward, but it needed to be slightly patched in order for it to compile against BoringSSL instead of OpenSSL. We are however working with the ipp-crypto library team to get the patches merged upstream.

Relevant Links:

https://en.wikipedia.org/wiki/Integrated_Performance_Primitives
https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/ipp.html

@ipuustin ipuustin added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Apr 7, 2021
@snowp
Copy link
Contributor

snowp commented Apr 7, 2021

For testing is there any way we can run these tests in CI? Not sure how much control we have over processor types in CI. @lizan do you have thoughts on our options here?

@envoyproxy/dependency-shepherds for thoughts on adding the dependency.

@snowp snowp added design proposal Needs design doc/proposal before implementation and removed triage Issue requires triage labels Apr 7, 2021
@mattklein123 mattklein123 added area/tls and removed enhancement Feature requests. Not bugs or questions. labels Apr 7, 2021
@mattklein123
Copy link
Member

If the dep is only uses by the extension I think it's OK with me, but will see if @htuch or @moderation have any concerns.

In terms of testing, we have other tests that check for hardware/OS compat before running, though it might be good to also mock to make sure we have good coverage as I'm not sure of CI status and it might change over time.

@lizan
Copy link
Member

lizan commented Apr 7, 2021

We don't have Xeon processor based CI runner today, but we can provision a few to the CI cluster.

@moderation
Copy link
Contributor

Is the proposed dependency open-source? I couldn't see a link to a repo on the link provided and a "Get Access" button indicating the code is behind a wall. Would be interesting to run the ipp-crypto dependency through the dependency policy at https://github.com/envoyproxy/envoy/blob/main/DEPENDENCY_POLICY.md

@rojkov
Copy link
Member

rojkov commented Apr 8, 2021

@moderation Here's the repo of the library https://github.com/intel/ipp-crypto

@ipuustin
Copy link
Member Author

ipuustin commented Apr 8, 2021

Thanks for the comments. It seems that writing a mock library for testing the multi-buffer operations using just regular crypto functions would be a good idea. I'll show the DEPENDENCY_POLICY.md list to the ipp-crypto team to get comments to the items there.

@htuch
Copy link
Member

htuch commented Apr 8, 2021

@ipuustin I'd definitely suggest adding a SECURITY.md and plan around how to handle security incidents for something this foundational to transport sec.

@github-actions
Copy link

github-actions bot commented May 8, 2021

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label May 8, 2021
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

lizan pushed a commit that referenced this issue Sep 27, 2021
Intel's IPP (Integrated Performance Primitives) crypto library has support for multi-buffer crypto operations. Briefly, multi-buffer
cryptography is implemented with AVX-512 instructions using a SIMD (single instruction, multiple data) mechanism. Up to eight RSA or ECDSA operations are gathered together into a buffer and processed at the same time, providing potentially improved performance. The AVX-512 instructions are available on recently launched 3rd generation Xeon Scalable server processors (Ice Lake server) processors.

This commit adds a private key provider to accelerate RSA and ECDSA crypto operations on recent Intel Xeon processors. Every worker thread has a queue of up-to-eight crypto operations. When the queue is full or when the timer is triggered, the queue is processed and all the pending handshakes are notified.

The potential performance benefit depends on many factors: the size of the cpuset Envoy is running on, incoming traffic pattern, encryption type (RSA or ECDSA), and key size. In my own testing I saw the biggest performance increase when long RSA keys were used on an Envoy running in a fairly limited environment serving lots of new incoming TLS requests. For more details, see this Intel whitepaper which contains some more information about the AVX-512 instructions and potential performance increase: https://www.intel.com/content/www/us/en/architecture-and-technology/crypto-acceleration-in-xeon-scalable-processors-wp.html

Additional Description:

One new dependency is introduced: Intel’s ipp-crypto library. Currently the PR is using a development version of ipp-crypto because BoringSSL support is not yet part of any release. The ipp-crypto team has indicated that BoringSSL version will be included in future ipp-crypto releases.

Basic tests are provided, and a fake library interface is included for testing on systems without the required AVX-512 instruction set.

Risk Level: Medium (TLS security feature, not enabled by default)
Testing: Unit tests
Docs Changes: API interface is documented
Release Notes: Added CryptoMB private key provider to contrib.
Platform Specific Features: Requires Intel 3rd generation Xeon Scalable server processor for the AVX-512 IFMA instruction set.
Fixes: #15871

Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Co-authored-by: Greg Greenway <ggreenway@apple.com>
soulxu pushed a commit to soulxu/envoy that referenced this issue Oct 16, 2021
Intel's IPP (Integrated Performance Primitives) crypto library has support for multi-buffer crypto operations. Briefly, multi-buffer
cryptography is implemented with AVX-512 instructions using a SIMD (single instruction, multiple data) mechanism. Up to eight RSA or ECDSA operations are gathered together into a buffer and processed at the same time, providing potentially improved performance. The AVX-512 instructions are available on recently launched 3rd generation Xeon Scalable server processors (Ice Lake server) processors.

This commit adds a private key provider to accelerate RSA and ECDSA crypto operations on recent Intel Xeon processors. Every worker thread has a queue of up-to-eight crypto operations. When the queue is full or when the timer is triggered, the queue is processed and all the pending handshakes are notified.

The potential performance benefit depends on many factors: the size of the cpuset Envoy is running on, incoming traffic pattern, encryption type (RSA or ECDSA), and key size. In my own testing I saw the biggest performance increase when long RSA keys were used on an Envoy running in a fairly limited environment serving lots of new incoming TLS requests. For more details, see this Intel whitepaper which contains some more information about the AVX-512 instructions and potential performance increase: https://www.intel.com/content/www/us/en/architecture-and-technology/crypto-acceleration-in-xeon-scalable-processors-wp.html

Additional Description:

One new dependency is introduced: Intel’s ipp-crypto library. Currently the PR is using a development version of ipp-crypto because BoringSSL support is not yet part of any release. The ipp-crypto team has indicated that BoringSSL version will be included in future ipp-crypto releases.

Basic tests are provided, and a fake library interface is included for testing on systems without the required AVX-512 instruction set.

Risk Level: Medium (TLS security feature, not enabled by default)
Testing: Unit tests
Docs Changes: API interface is documented
Release Notes: Added CryptoMB private key provider to contrib.
Platform Specific Features: Requires Intel 3rd generation Xeon Scalable server processor for the AVX-512 IFMA instruction set.
Fixes: envoyproxy#15871

Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
Co-authored-by: Greg Greenway <ggreenway@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tls design proposal Needs design doc/proposal before implementation stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
7 participants