Originally posted by Lanny
what is an ultra low latency fpga spectral?
Real-Time FPGA Numerical Computing for Ultra-Low Latency High Frequency Trading (ULL-HFT)
Khaled Aly, Technical Author & Research Analyst
2/18/2015 03:55 PM EST
2 comments post a comment
NO RATINGS
1 saves
Login to Rate
inShare38
Emerging capital market high-frequency trading (HFT) is bringing strong FPGA use cases in networking, messaging, and financial computing acceleration.
This article is a sequel to my previous column: Introducing FPGA-Based Acceleration for High-Frequency Trading. This new column elaborates on the merits of Decimal Floating Point Arithmetic (DFPA) acceleration and presents a numerical computing model for ULL-HFT (Ultra-Low Latency High Frequency Trading) environments, developed while I headed Technical Business Development at SilMinds, Inc. A use case synopsis is as follows:
The ongoing struggle to minimize "slippage" -- which is broadly defined as the latency between the instances of trader order execution and transaction actualization at the exchange -- with consequent uncalculated monetary variations motivates HFT technology towards a theoretical 'zero-latency' objective. Conventional approaches such as exchange proximity hosting, colocation, hardware ticker plants, and lossless LAN switches have been superseded by deploying FPGA acceleration to offload network and application protocols, and to run trader processes such as portfolio order and execution management. The merit for intense, real-time numerical computing of hundreds of financial indicators and/or indices at sub-millisecond tick rates is often associated with decimal accuracy compliance requirements.
Decimal Floating-Point Arithmetic (DFPA) in a nutshell
Binary Floating Point Arithmetic (BFPA) runs efficiently in native processor hardware, but is unable to accurately represent/maintain decimal real numbers. This is because 1/2 + 1/4 + 1/8 + ... does not cover the entire decimal fraction numeric space; in fact, even the decimal quantity of 0.1 cannot be accurately represented. The choice of whether, and how, to implement decimal real number accuracy is left up to the software developer. The statistical nature of most trading computations inherently affords higher tolerance to binary real number inaccuracies than deterministic financial applications (e.g., banking). However, financial regulations and some algorithmic and operational considerations may require DFP encoding and arithmetic.
Approaches to maintain DFP accuracy include the following:
Deploy server platforms with DFPA processor support, such as the IBM Power and Oracle SPARC, which support basic operations like addition and multiplication in hardware and build the more demanding operations algorithmically in software.
Scale up all real numbers to integers, perform all-integer computations, and then down-scale them before they are passed to other processes. This approach results in poor code management, especially among non-uniform precisions, possibly reaching 14 places beyond the decimal point (Ref: S&P Dow Jones Indices, "S&P Global 1200 Methodology," Apr. 2011).
Use software DFPA libraries, such as the Intel Math Decimal Floating Point Library, and bear with the consequent computational latency.
In order to address this issue, SilMinds offers a patented, extensively-verified, 64/128-bit IEEE 754-2008 standard compliant DFPA IP units library that covers operations like division, power, square rooting, and indexed summation. Units internally employ the hardware-efficient BCD-like DPD (Densely Packed Decimal) encoding, but their I/O interfaces support the more compact software-oriented BID (Binary Integer Decimal) and ASCII "string"-based encodings.
Real-time out-of-band numerical computing model
Many algorithmic traders choose to conduct real number arithmetic in a software DFP form or workaround, and then bear with the undesirable increased slippage. By comparison, with currently available FPGA clock rates, the SilMinds library offers order of nanosecond DFPA operations.
Integrating "performance costly" cross conversions from/to computation-agnostic "string" number representations of standard Financial Information exchange (FIX) and other legacy proprietary protocols with DFPA units is a major value added. Since BFPA units are available at low cost and small area, thanks to their ubiquitous use as DSP blocks, it is easy to optimize FPGA utilization with a combination of integer, BFP, and DFP arithmetic units.
The out-of-band model comprises real-time hardware numeric computation of any number of indicators and/or indices using string-to-DPD values passed by the FIX decoder -- all within the tick period so that resultant decimal values become accessible in a pre-allocated RAM space by the following tick to be simultaneously used by all algorithms that may be implemented in an arbitrary combination of software and hardware. DFPA engines can be integrated with network stack offloads on a single FPGA or on a multi-FPGA PCI-e card employing the appropriate amount of parallelism to meet the desired frequency.
Benchmarking has proved that hardware DFPA yields accurate, jitter-free speed-up as high as 1000X woth respect to optimized software; implying sub-microsecond operation run time, hundreds-to-thousands as many operations per "future-proof" tick (one second to a microsecond), and/or deeper moving averages across arbitrarily larger number of securities (symbols). While most exchanges currently publish trade prices at millisecond ticks, trader algorithms may require sub-microsecond rate computations to control slippage.
The following generically portrays three use cases of the described model: