stats.awk

stream statistics utility

2026-02-05
Takayuki HOSODA

Overview

stats.awk is a simple, one-pass streaming statistics utility, implemented in AWK. stats.awk reads numeric values from standard input or files and computes basic descriptive statistics (mean, variance, standard deviation, min/max) and optionally 1-pass linear regression y = a + b x. It is intended for use in Unix pipelines on FreeBSD and Linux systems. All computations are performed in a single pass over the input stream.

When processing single-column input with no xcol or ycol specified, stats.awk automatically performs linear regression using the record index as x and the column value as y. Descriptive statistics are always computed from the y values.

Features

One-pass calculation of:

Requirements

AWK implementation with: Tested on: gawk, nawk, FreeBSD awk

Installation

Using Makefile

sh

make install           # Installs script and man page to /usr/local
make uninstall         # Removes installed files
You can override the prefix:
sh

make PREFIX=$HOME/.local install

Manual installation

sh

chmod +x stats.awk
cp stats.awk /usr/local/bin/
cp stats.awk.1 /usr/local/man/man1/

Usage

sh

awk -f stats.awk [options] [file ...] 

Output format

TEXT FORMAT
Example:

    Count:             1400
    Mean:              1.1883143
    Sum:               1663.6401
    Unbiased Variance: 0.002564187
    Unbiased StdDev:   0.050637802
    Peak (Max):        1.353943
    2nd Peak:          1.353943
    Neg Peak (Min):    0.996242
    2nd Neg Peak:      1.000321
    Regression a:      1.2674196
    Regression b:      -0.00011292684 
JSON FORMAT
Example:

    {
        "count": 1400,
        "mean": 1.1883143,
        "sum": 1663.6401,
        "var": 0.002564187,
        "std": 0.050637802,
        "max": 1.353943,
        "max2": 1.353943,
        "min": 0.996242,
        "min2": 1.000321,
        "regression": {
            "a": 1.2674196,
            "b": -0.00011292684
        }
    }

Examples

Single-column input (automatic regression vs record index)
sh
    seq 1 100 | stats.awk
CSV output
sh
    stats.awk -v out=csv data.txt
12-digits precision scientific notation
sh
    stats.awk -v prec=12 -v fmt=sci data.txt
Linear regression on 1st and 2nd columns
sh
    stats.awk -v xcol=1 -v ycol=2 data.txt
JSON output for downstream processing
sh
    stats.awk -v out=json data.txt | jq .

Download

 Download stats.awk-1.20.tar.gz— stats.awk script and manual page.

Hash values

The archive contains the files 'stats.awk.sha256' and 'stats.awk.md5', which contain the sha265 and md5 hash values for stats.awk respectively.

SHA256 (stats.awk) = 3dc37347c677aac25989dd7ee2f1af3fdade71a5c21a5fbe68ff3d3ad283d576
MD5 (stats.awk) = 110a558e7ca41edda3e0052bb98ae6ea

Users can verify the distributed stats.awk using:

sh
  sha256 stats.awk
  md5    stats.awk

License

SPDX-License-Identifier: BSD-3-Clause
(c) 2026, Takayuki HOSODA
www.finetune.co.jp [Mail] © 2000 Takayuki HOSODA.