modular arithmetic based calculation (including CRT)
Direct computation on multi-precision integers requires hardware resources that grow on the order of Ο (n2), which quickly becomes impractical. If a large integer can be decomposed into a set of smaller integers and arithmetic operations are performed on those components, the required hardware scale for numerical computation circuits can, in some cases, be significantly reduced.
For multi-precision multiplication, methods based on the FFT are well known. In this article, however, we introduce numerical calculation using a modular arithmetic system based on the Chinese Remainder Theorem (CRT).
Note:
For operands with fewer than several thousand digits,
for cases where calculations cannot be carried out entirely within the residue system,
or when modular arithmetic is not efficiently supported in hardware,
the advantages over methods such as the Karatsuba algorithm are limited.
This is because modular reduction by constants itself has a non-negligible computational cost,
and reconstruction of the original integer requires handling large numbers through a sum of products with multiplicative inverses.
A5 = 2 (mod 5),
A7 = 0 (mod 7),
A9 = 6 (mod 9),
A11 = 5 (mod 11),
A13 = 6 (mod 13),
A16 = 5 (mod 16)
B5 = 2 (mod 5),
B7 = 6 (mod 7),
B9 = 7 (mod 9),
B11 = 5 (mod 11),
B13 = 9 (mod 13),
B16 = 12 (mod 16)
Next, we compute the residues of their squares, SA and SB:
SA5 = 4 (mod 5),
SA7 = 0 (mod 7),
SA9 = 0 (mod 9),
SA11 = 3 (mod 11),
SA13 = 10 (mod 13),
SA16 = 9 (mod 16)
SB5 = 4 (mod 5),
SB7 = 1 (mod 7),
SB9 = 2 (mod 9),
SB11 = 3 (mod 11),
SB13 = 3 (mod 13),
SB16 = 0 (mod 16)
Taking the sum of the residues, we obtain:
S5 = 3 (mod 5),
S7 = 1 (mod 7),
S9 = 4 (mod 9),
S11 = 6 (mod 11),
S13 = 0 (mod 13),
S16 = 9 (mod 16)
Thus, the result has been obtained within the residue number system.I5 = 576576,
I7 = 205920,
I9 = 320320,
I11 = 196560,
I13 = 277200,
I16 = 585585
The original integer value is then reconstructed as the sum of products of each residue and its corresponding inverse:
X = ( (S5 × I 5) (mod n)
+ (S7 × I 7) (mod n)
+ (S9 × I 9) (mod n)
+ (S11 × I11) (mod n)
+ (S13 × I13) (mod n)
+ (S16 × I16) (mod n) ) (mod n)
= 297193 (mod n)
Numerical calculation based on the Chinese Remainder Theorem is attractive because it allows arithmetic operations to be decomposed into smaller, independent computations. This decomposition naturally enables parallel execution, which is one of the primary motivations for using CRT-based methods in hardware-oriented designs.
In addition, the freedom in choosing a set of pairwise coprime moduli provides a means to control the granularity of parallel computation. This allows designers to balance word length, resource usage, and throughput according to the target architecture and performance requirements.
However, practical implementations must take the following points into account:
As a result, on DSPs and FPGAs, the relative advantage of Karatsuba multiplication, FFT-based methods, or CRT-based residue number systems varies significantly depending on the target bit width, available parallel resources, and computation pattern.
From the perspective of multi-precision arithmetic, FFT-based methods and CRT-based methods are essentially similar in that they both decompose a large number into orthogonal components to reduce computational complexity through parallelism.
If a large number can be decomposed into n orthogonal components, the computation can be performed using n smaller calculations. Choosing unit vectors of the form eiωnt leads to the Fourier transform, while choosing pairwise coprime integers leads to the Chinese Remainder Theorem.



© 2000 Takayuki HOSODA.