Mark Friedenbach

Making freicoind faster and safer with fewer dependencies (simpler demurrage calculations)

Recommended Posts

A little bit of a technical update I want to share. As part of the refactoring and rebasing effort I've worked out a way to eliminate Freicoin's extra dependencies while at the same time achieving a 10x performance boost for the mathematical calculations we use these libraries for. I'm quite excited about this, although for reasons that might be obscure for those without a technical background. If you don't understand this post, that's okay. The TL;DR is that we can make Freicoin faster while further aligning the code base with bitcoin to minimize the maintenance burden.

Freicoin uses the GNU MP and MPFR multi-precision arithmetic libraries for performing cross-platform portable demurrage calculations within consensus code. Calculating demurrage factors involves taking the per-block demurrage rate r = (2-20 - 1)/(2-20) and raising it to the nth power, where n is the relative depth of the output (next block height - output's refheight). The fastest way to do this on workstation or server-class hardware is to use the CPU's floating point unit to perform the exponentiation. Unfortunately there is just absolutely no way that this can be done safely and consistently inside of consensus code. CPU manufacturers do not guarantee the bit-for-bit accuracy of operations compared with other CPUs of the same target architecture, much less with reference values or cross-platform targets. There was at one time a dream of decimal floating point (IEEE 754-2008) which would have had reproducibility as a requirement, but alas that never took off.

The solution we adopted in 2012 was to use GNU MP (a multi-precision exact big integer and rational arithmetic library) and GNU MPFR (a multi-precision software floating-point library with rigidly defined semantics) to implement precise, cross-platform software floating-point demurrage accounting. This has the advantage of getting the job done, but there are some downsides:

  • Performance. Freicoin suffers from this use of external multi-precision arithmetic libraries. Although optimized for speed GNU MP simply cannot approach the raw performance of directly adding two 64-bit numbers in hardware, and the multi-precision heap allocations add overhead and indirect storage causing latency. MPFR on the other hand is designed for precise, cross-platform behavior in simulations, not frequent high performance calculations. It is also overkill given that we use the library for one single calculation only, for which a generic library like MPFR would not provide support for specialized optimizations.
  • Reliability. Both GNU MP and MPFR use dynamic (heap allocated) memory for their bignum representations. That means that there potentially is allocation or reallocation happening every single time a call is made to GNU MP or MPFR arithmetic functions, and potentially an out of memory exception is generated. Originally this was every single time any kind of arithmetic was done on freicoin values. With the recent input-truncation soft-fork it is already possible to reduce this to just demurrage calculations. However that still means that there is the possibility for abnormal or transient exceptions to occur in code that in bitcoin consists of normal arithmetic operations. This code was not audited for failure modes as a result of this different behavior, and there is added risk of local node instability as a result.
  • Maintenance. Bitcoin has special scripts to make reproducible builds of all of its dependencies. These scripts are custom developed and maintained by a sizable Bitcoin Core development team, and these build scripts are very frequently updated as bitcoin matures. These scripts would have to be modified to include reproducible builds of GNU MP and MPFR, and those changes maintained and re-integrated with each new release of bitcoin. This is a bit of a sore point for me because I originally fell off the wagon of providing regular maintenance updates to Freicoin on release 0.9, in which the build system changed quite drastically due to the switch to autotools.
  • Legal. GNU MP and GNU MPFR are both distributed under the terms of the LGPL. Because official Freicoin binary distributions are statically linked, that meant that they are also GPL programs. Now I'm a strong advocate of libre software, user rights, and the GPL, so on the face of it this was not too concerning to me back in 2012. However the Freicoin software repository says like bitcoin that it is MIT/X11 software licensed, and these sort of viral software licensing traps are difficult to see, so I can see how someone might run into GPL license violations as a result, without intent of doing so. There is a permissively licensed fork / re-implementation of GNU MP that we could use, but not alternative for MPFR.

Given that the rebase to bitcoin master (0.16 as of now) would require re-doing the pieces which use GNU MP and MPFR anyway, it seemed like an opportune time to look at replacing these dependencies with something faster, safer, in-house, and permissively licensed. Thanks to input truncation, all the regular accounting can be reverted back to 64-bit integer arithmetic instead of multi-precision rational numbers. The trickier part is the demurrage calculation, which still needs to do an internal real-number precise exponentiation.

To remove the dependency on GNU MPFR I wrote an integer-math-only exponentiation of the freicoin demurrage rate using pre-calculated look-up tables, and through exhaustive testing proved it to be bit-for-bit identical in its results compared with the GNU MPFR version. I then went through and applied as many approximations and reductions of internal precision as I could without generating incorrect results, to get the absolute fastest implementation. Here's some raw numbers:

Quote

    mpfr_mul:    2.215 ms  (      0,67108864,       0)
    mpfr_small:  2.058 ms  (      8,67108848,       8)
    apu_mul:       200 ns  (      0,67108864,       0)
    apu_small:     119 ns  (      0,67108864,       0)
    f32_mul:        49 ns  (9812683,37436372,19859809)
    f64_mul:        46 ns  (      0,47652674,19456190)
    f80_mul:       122 ns  (3426315,46701202,16981347)

    mpfr_div:    2.435 ms  (      0,67108864,       0)
    f32_div:        50 ns  (  17275,32623150,34468439)
    f64_div:        49 ns  (      0,40281993,26826871)
    f80_div:       132 ns  (      0,44210179,22898685)

The first column is the algorithm name. The "mpfr_mul" is the algorithm currently used in freicoin, and "mpfr_div" and "mpfr_small" are variations upon it. The "small" variation reduces internal precision to as low as can be done without errors in exhaustive testing, which would be a cheap one-line change to the existing method. The "apu" algorithm is the new integer-math-only implementation using pre-generated lookup tables for the exponentiation. The "f32"/"f64"/"f80" algorithms use single-precision (float), double precision, and extended precision (long double) floating-point math on the CPU, as a point of comparison for timing. The "div" variations calculate the inverse of the demurrage rate, which is potentially a faster exponentiation to perform albeit at the price of a slow division at the end. There is an "apu_div" method, but it is unfinished so I did not include numbers for it.

The second column gives the average run time per evaluation of the demurrage calculation on my laptop. Neither the "mpfr" nor "apu" methods have constant time performance -- the runtime speed should have some logarithmic dependency on the number of set bits in the integer representation of the relative depth (the exponent in the demurrage calculation)--a point I'll get back to in a minute. For the "mpfr" family this effect is small, but for the "apu" family it is more pronounced and the worst case performance should be about twice what is reported here. But conversely the expected execution time depends on the average age of UTXOs, which would be way less than the exhaustive testing reported here. The hardware accelerated floating point operations should be constant-time on most CPUs. The numbers above were calculated based on an exhaustive test, for which the "average" age of a UTXO is about 1,000 years. Here are more realistic numbers with an average age of 1 year:

Quote

    mpfr_mul:    1698 ns  (    0,65536,    0)
    mpfr_small:  1602 ns  (    0,65536,    0)
    apu_mul:      123 ns  (    0,65536,    0)
    apu_small:     83 ns  (    0,65536,    0)
    f32_mul:       47 ns  (32811,    1,32724)
    f64_mul:       63 ns  (    0,34867,30669)
    f80_mul:      118 ns  (    0,49566,15970)

    mpfr_div:    1991 ns  (    0,65536,    0)
    f32_div:       76 ns  (16710,    1,48825)
    f64_div:       50 ns  (    0,   34,65502)
    f80_div:      111 ns  (    0,12643,52893)

Note that mpfr_small and apu_small are 22% and 30% faster, respectfully. (Also note that apu_small is about as fast as performing the exponentiation on actual floating point hardware! That was a nice surprise.)

The final column gives a triplet of numbers reporting (1) the number of errors found in exhaustive testing in which the demurrage was greater; (2) the number of test cases which matched the reference implementation; and (3) the number of errors found in which the demurrage was less than the reference implementation (mpfr_mul). Note the intrinsic errors involved with using hardware floating point, even for "f64" and "f80" which unlike "f32" should have had sufficient internal precision to perform this calculation exactly. Hardware floating-point is not to be trusted.

The clear winning approach is "apu_small", and I will be replacing the GNU GMPR demurrage calculation code with this technique, using an implementation that has been exhaustively tested to be 1:1 identical in results with the multi-precision implementation that is presently in the Freicoin consensus code.

What I did not expect going into this is that this implementation is not just as good as MPFR's general approach, but is in fact nearly 20x more efficient in common use cases! And I am blown away that it is nearly as fast as computing demurrage directly in hardware floating point.

While the difference is unlikely to have profound effects outside of a few special cases (e.g. calling 'getbalance" on a very large wallet), it will nevertheless reduce latency of various transaction processing and UI code, thereby delivering a better user experience. Far more critically it will improve the safety and security of the Freicoin reference implementation while significantly reducing the work that goes into keeping Freicoin up to date with upstream bitcoin. That's exciting to me at least :)

You can expect a new version of freicoind with these improvements integrated to be out soon, as that is what I am currently working on.

Share this post


Link to post
Share on other sites

thx for your work!

I have some questions to the implementation of demurrage in Freicoin:

Is there still a need to have a refheight in the Freicoin transaction if so why?

Is the demurrage calculated on the uxto set, or on the transactions itself?

If on the transaction itself is then pruning of old transactions possible in Freicoin?

 

Share this post


Link to post
Share on other sites

Is there still a need to have a refheight in the Freicoin transaction if so why?

There's a couple of ways I can answer this question.

First, there is always a need for refheight in Freicoin transactions just by definition: it would be a hard-forking consensus change to remove them. This is the technically correct answer but mention it only for completeness as I assume it's not what you meant.

This doesn't affect the need for reference heights in any fundamental way. Reference heights address a problem that generally applies to demurrage in bitcoin-like block chains: the demurrage rate reduces the inputs of a transaction without affecting the outputs, so if naively evaluated at block height it would introduce a back-door transaction expiry as at some point (which can be calculated) the transaction will have less input than output, resulting in a negative fee and consensus violation if it were then included in a block. There are basically three approaches to fixing this: (1) allow negative fees so long as the block as a whole doesn't have negative fees in aggregate; (2) assign a specific height to a transaction, less than the current block height, and use that for demurrage calculations so it is independent of block height; or (3) forget demurrage in the protocol and have an exponentially inflating currency instead. The first approach didn't work in 2012 as child-pays-for-parent wasn't wasn't well supported, and still isn't to some degree. The third approach we rejected for experimental reasons, although it would have had the nice property of pulling demurrage calculations out of consensus code and block validation logic. The second approach is what we did and what we're stuck with, at least on the present and future soft-fork compatible incarnations of the Freicoin block chain.

Reaching more into an aside, I will say that it was never necessary that Freicoin transactions have an explicit reference height field at all, which is what breaks compatibility with bitcoin tools. The semantics of the reference height are that it is a block number which must be greater than or equal to each of its inputs reference heights, and less than or equal to the current block height. This happens to be a more strict subset of the rules for transaction lock-time! So an extra refheight isn't really necessary and never was--we could have just double-purposed the nLockTime field to be a reference height, making its use mandatory and adding extra constraints and restrictions. But alas we didn't notice this until years after the initial release of Freicoin, and that ship has sailed leaving us stuck with reference heights as an explicit extra field. Oops.

Is the demurrage calculated on the uxto set, or on the transactions itself?

Reference height of inputs are stored in the UTXO set; demurrage is calculated during transaction validation. In bitcoin there is a check that sum(inputs) >= sum(outputs) in any context where a transaction needs to be validated -- e.g. when adding to your mempool or when encountered in a block. In freicoin we do the exact same thing except in the summation of inputs we adjust each term by the demurrage rate calculated from the difference of the transaction's reference height from its input's reference height.

> If on the transaction itself is then pruning of old transactions possible in Freicoin?

Yes, there has never been any restrictions on pruning old blocks. There just hasn't been an official release that supports this feature, which was added in bitcoin v0.11.0.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now