An Example of Numerical Calculation Using Modular Arithmetic

modular arithmetic based calculation (including CRT)

2026-01-04

Takayuki HOSODA

Overview

Direct computation on multi-precision integers requires hardware resources that grow on the order of Ο (n²), which quickly becomes impractical. If a large integer can be decomposed into a set of smaller integers and arithmetic operations are performed on those components, the required hardware scale for numerical computation circuits can, in some cases, be significantly reduced.

For multi-precision multiplication, methods based on the FFT are well known. In this article, however, we introduce numerical calculation using a modular arithmetic system based on the Chinese Remainder Theorem (CRT).

Note:
For operands with fewer than several thousand digits, for cases where calculations cannot be carried out entirely within the residue system, or when modular arithmetic is not efficiently supported in hardware, the advantages over methods such as the Karatsuba algorithm are limited. This is because modular reduction by constants itself has a non-negligible computational cost, and reconstruction of the original integer requires handling large numbers through a sum of products with multiplicative inverses.

Example of Numerical Calculation Using a Residue Number System Based on CRT

As an example, let us compute the sum of squares of two positive 9-bit integers A, B using a residue number system:

X = A² + B²

Since the sum of squares of two 9-bit integers requires at most 19 bits, we choose a set of pairwise coprime integers d₁ d₂ c d_t such that

2¹⁹ = 524,288 ≤ d₁ d₂ c d_t

As one example, consider the set {5,7,9,11,13,16}. The product of these elements is

2¹⁹ = 524,288 < n = 5 × 7 × 9 × 11 × 13 × 16 = 720,720

which satisfies the above condition, and we therefore use this set. Let the two given integers be A=357 B=412 Taking the residues of A and B modulo each element of the set {5,7,9,11,13,16}, we obtain:

A₅  = 2 (mod  5),
A₇  = 0 (mod  7),
A₉  = 6 (mod  9),
A₁₁ = 5 (mod 11),
A₁₃ = 6 (mod 13),
A₁₆ = 5 (mod 16)

B₅  =  2 (mod  5),
B₇  =  6 (mod  7),
B₉  =  7 (mod  9),
B₁₁ =  5 (mod 11),
B₁₃ =  9 (mod 13),
B₁₆ = 12 (mod 16)

Next, we compute the residues of their squares, SA and SB:

SA₅  =  4 (mod  5),
SA₇  =  0 (mod  7),
SA₉  =  0 (mod  9),
SA₁₁ =  3 (mod 11),
SA₁₃ = 10 (mod 13),
SA₁₆ =  9 (mod 16)

SB₅  = 4 (mod  5),
SB₇  = 1 (mod  7),
SB₉  = 2 (mod  9),
SB₁₁ = 3 (mod 11),
SB₁₃ = 3 (mod 13),
SB₁₆ = 0 (mod 16)

Taking the sum of the residues, we obtain:

S₅  = 3 (mod  5),
S₇  = 1 (mod  7),
S₉  = 4 (mod  9),
S₁₁ = 6 (mod 11),
S₁₃ = 0 (mod 13),
S₁₆ = 9 (mod 16)

Thus, the result has been obtained within the residue number system.
Further computations may be continued in this form. However, to reconstruct the result as an ordinary integer, e compute the multiplicative inverse of n / d modulo d_j , that is, values I_j satisfying

n / d_j · I_j ≡ 1 (mod d_j)

These inverses are obtained using the extended Euclidean algorithm, yielding:

I₅  = 576576,
I₇  = 205920,
I₉  = 320320,
I₁₁ = 196560,
I₁₃ = 277200,
I₁₆ = 585585

The original integer value is then reconstructed as the sum of products of each residue and its corresponding inverse:


X = ( (S₅  × I₅) (mod n)
    + (S₇  × I₇) (mod n)
    + (S₉  × I₉) (mod n)
    + (S₁₁ × I₁₁) (mod n)
    + (S₁₃ × I₁₃) (mod n)
    + (S₁₆ × I₁₆) (mod n) ) (mod n)
  = 297193 (mod n)

Summary

Numerical calculation based on the Chinese Remainder Theorem is attractive because it allows arithmetic operations to be decomposed into smaller, independent computations. This decomposition naturally enables parallel execution, which is one of the primary motivations for using CRT-based methods in hardware-oriented designs.

In addition, the freedom in choosing a set of pairwise coprime moduli provides a means to control the granularity of parallel computation. This allows designers to balance word length, resource usage, and throughput according to the target architecture and performance requirements.

However, practical implementations must take the following points into account:

The computational cost of modular reduction
Overall optimization, including multiplicative inverse computation and the adder tree required for reconstruction

As a result, on DSPs and FPGAs, the relative advantage of Karatsuba multiplication, FFT-based methods, or CRT-based residue number systems varies significantly depending on the target bit width, available parallel resources, and computation pattern.

From the perspective of multi-precision arithmetic, FFT-based methods and CRT-based methods are essentially similar in that they both decompose a large number into orthogonal components to reduce computational complexity through parallelism.

If a large number can be decomposed into n orthogonal components, the computation can be performed using n smaller calculations. Choosing unit vectors of the form e^iω_nt leads to the Fourier transform, while choosing pairwise coprime integers leads to the Chinese Remainder Theorem.

An Example of Numerical Calculation Using Modular Arithmetic

Overview

Example of Numerical Calculation Using a Residue Number System Based on CRT

Summary

SEE ALSO