(255,329) 2-bit ランダム誤り訂正付きバイナリ・テキスト変換

BCH (255,329) 2-bit ランダム誤り訂正／3-bit 誤り検出符号を利用した
バイナリ・テキスト変換プログラム

Command line help message


bchtx: encode/decode a binary file with error correction v1.45 (Jul. 7 2003)
Copyright (c) 1993-2003, Takayuki Hosoda. All rights reserved.
bchtx is free software with ABSOLUTELY NO WARRANTY.
Synopsis :
  bchtx -a [-f] [-v]  [readfile]
  bchtx -c [-f] [-v] [[readfile] writefile]
  bchtx -d [-f] [-v] [[readfile] writefile]
  bchtx -e [-f] [-v]   readfile [writefile]
  bchtx -w [-f] [-v]
  bchtx [-h]
Note :
  If file name is not specified, stdin or stdout will be used instead.
Options :
  -a : automatically decode readfile to the file of original file name.
  -c : decode readfile to writefile without error correction.
  -d : decode readfile to writefile.
  -e : encode readfile to writefile.
  -f : force write.
  -h : show this message.
  -v : show verbose message.
  -w : write error correction table onto the directory defined by the
       environment variable 'BCHTX'.

Download

bchtx-1.4.5.tar.gz

Technical note

bchtx technical document                                 bchtx version 1.44

               An way to obtain reliable serial file transfer.

                               Takayuki Hosoda

  This program is to experiment the efficiency of error detection and cor-
rection used on rather high speed serial communication link.  To obtain 
error correction capability, it's making use of BCH(255,239) code.
  The features of this program are, two bit error correction and three bit
error detection capability for a block(39 byte), and object data consists 
of only visible seven bit ASCII charactors.

[Acknowledgement]
  BCH code, named from the initial of Bose, Chaudhuri, Hocquenghem, is a
wide class of cyclic code, which include Hamming code, and its expansion.
  It's a most important one of the error correction code, it can be app-
lied to wide range of code length and error correction capability. 
  When used at code length of some thouthand or short, It's redundancy is
relatively small refered to the other error correction code. The more 
important thing is that, it can be decoded by rather easy alithmatic way.
From these reason, BCH code is used in various applications as data transfer.

[BCH code]
  To obtain practical decoding speed, I'm going to use error correction table.
  How to correct is, at first calculate syndrome of received block, If it's
not zero, it indicates some error. Then addressing the table by the syndrome,
we can get error position of received data block. Flipping the bit at error 
position, we can get the corrected data. Because of addressing table by a
syndrome, there are restriction from the size of available memory, I might
think using 64Kword for the table may be a good trade off for the personal
computers nowadays. The table size of 64Kword suit the requirement of BCH
(255,239), two bit error correction out of 255 bit. We'll go on with this
selection.

1.  a positive integer m = 8
    code length n = 2^m - 1 = 2^8 - 1 = 255
2.  number of error correction bit t = 2
    information bit k >= 2^m - 1 - m*t = 2^8 - 1 - 8*2 = 239
    number of check bit n - k = 255 - 239 = 16
3.  1) 1..2t = 1, 2, 3, 4
    2) 1, 3
    3) degree 8
4.  Generator polynomial = 0011011110110001(0x37b1)

[Parity]
  The BCH code descrived above can detect and correct upto two bit errors,
however, there is a need to detect three bit errors to abort data conversion
process. An additional parity bit on BCH code enables it to detect three bit 
errors. This means to increase 'hamming distance' one.

[Data format]
  Because of it's making use of BCH(255,239), the maximun number
of information code is limited to 31 byte. I'll use 29 byte for
the convinience to have an encoded block size of 39 byte, which
is about half of a line length of a general terminal. Making it
double we get an output line.

  The internal BCH encoded data format is shown below.(fig.1)

 (fig.1)
 |<----------------------------255------------------------------->|
 |      |<---------------------234(39*6bit)---------------------->|
 |      |<------------------232(29byte)------------------>|       |
 |<-21->|<----------216(27byte)------->|<-------16------->|-1-|-1-|
 |______|______________________________|__________________|___|___|
 X______X______________________________X__________________X___X___X
    |   |        information data      |     check bit    | |   |
    |   |MSB                           |               LSB| |   |
    |   |data[0]...............data[28]|data[29]..data[30]| |   |
    |                                                       |   |
    +--- not used                        padding data '0' --+   |
                                                  parity bit ---+

  The BCH encoded data is encoded again to visible seven bit ASCII
charactors. The reason is to make it possible to transfer a
binary file through seven bit serial link, or through MoDem.

[Binary to text conversion]
 (fig.2)
 MSB                         LSB
  7   6   5   4   3   2   1   0
 _______________________________ 
|data[0]................|data[1]|  Each 3 byte of binary(original)
|_______________ _______|_______|  data will be separated to four
|........data[1]|data[2]........|  charactor of 6 bit length, and
|_______|_______|_______ _______|  encoded to 7 bit ASCII charac-
|data[2]|data[3]................|  tor. Conversion table is shown
|_______|_______________________|  below(fig.3).  
|data[4]................|data[5]|  
|                       |       |  

   The feature of this conversion table is its easiness to decode
 back to original six bit, All you have to do is to clear upper
 bit.

 (fig.3)       |       l o w e r   b i t 
               |____________________________________ 
               |                      1 1 1 1 1 1
               |  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
 ________ _____|____________________________________
         |  0  |  @ A B C D E F G H I J K L M N O  
    U    |_____|____________________________________
    P    |  1  |  P Q R S T U V W X Y Z [ \ ] ^ _  
    P    |_____|____________________________________ 
    E    |  2  |  ` a b c d e f g h i j k l m n o  
    R    |_____|____________________________________
         |  3  |  0 1 2 3 4 5 6 7 8 9 : ; < = > ?  
 ________|_____|____________________________________


[Text to binary conversion]

 (fig.4)
 MSB                         LSB
  7   6   5   4   3   2   1   0
 _______ _______________________ 
|xxxxxxx|data[0]................|  Each four lower 6 bit data will
|_______|_______ _______________|  be concatinated to three  8 bit 
|xxxxxxx|data[0]| data[1].......|  binary data. Upper two bit will
|_______|_______|_______ _______|  be negliected.
|xxxxxxx|........data[1]|data[2]|
|_______|_______________|_______|
|xxxxxxx|................data[2]|
|_______|_______________________|
|xxxxxxx|data[3]................|
|       |                       |


[bchtx file format]

  The BCH encoded output file is consist of 2 block/line. Each line is beggining
with 'p' that is used for start-of-heading code. The reason why beggining with
'p' is as follows, note that 'p' is explained as '1110000' in binary, and any 
two bit error won't cause it to be a control code ('0000000' to '00011111'),
nor DEL code ('1111111'). Using 'p' as the SOH code, we can ignore all kind of
line terminator which varies by system, i.e. LF for UNIX, CRLF for MS-DOS, CR 
for OS-9.
  Thus we understood we can ignore line terminator code, however, there still be 
a line terminator probrem. Imagine a error causes a charactor to be a control 
code, for example, a flip of 7th bit of charactor 'J' (1001010) makes LF code
(0001010), then, on some system in which a line terminator LF will translated to
CRLF. This translation causes unrecoverable error to the data. With BCH code, we
can correct bit errors, however, don't know how to delete an inserted charactor.
  For the worse, LF to CR(0001101) conversion brings it to out of reach of two 
bit error correction capability.

  Before the first block is a comment line that shows original file name and
total number of lines. The first block is used as syncronize word to find start 
of BCH encoded text, in which contains the bchtx signiture. The second block is 
used as header in which contains size of the original file, header extension 
flag, number of extended header. The space not used in the header block is 
reserved for future use. The third to (m-1)'th block is used for extended header, 
in which contains the original file name so far, the block can be expanded to 
contain the file name upto 255 charactors of maximum name length. The m'th to 
(n-3)'th block is main body of encoded data. (n-2)'th to (n-1)'th two block is 
to contain original file's 32 bit Cyclic Redundancy Check(CRC32). This is a 
variation of CCITT's CRC32 (explaned later in this file).

(fig.5)
 ___________________________________________________
  block number |  comment
 ______________|____________________________________
 ______________|____________________________________
       0       |  comment field 
 ______________|____________________________________
       1       |  signiture word
 ______________|____________________________________
       2       |  header
 ______________|____________________________________
       3..m-1  |  extended header                   
 ______________|____________________________________
       m..n-3  |  main body
 ______________|____________________________________
       n-2..n-1|  CRC
 ______________|____________________________________
       n       |  footer comment [EOF]
 ______________|____________________________________

[CRC32]
   Though BCH's error correction ability is powerful, however, it could not
 check the error out of the block. For this reason the other way is required
 to check the integrity of decoded file. I chose CRC32 for the answer.
   For the CRC32 is a bit short for the maximum file size of 2Gbyte, consi-
 dered to expand CRC32 pararelly to 8 bit by 32 lengh.
   The original files' CRC is calculated  and added  at the end of encoded
 file to check the integrity at the decode end. 
 
 (fig.6)
                <- 8bit ->
                    +----------------+
                ____|____            |    CRC is calculated through 32
          X^31 |_________|           |   stages simulated linier feedback 
          X^30 |_________|           |   shift registor(LFSR).
          X^29 |_________|           |     The generator polynomial is same
          X^28 |_________|           |   as CCITT's CRC32.
          X^27 |_________|           |
          X^26 |____X____|-<--+      |    'Exclusive or' (Half addition) is
          X^25 |_________|    |      |   done at 'X' position.
          X^24 |_________|    |      |
          X^23 |____X____|-<--+      |
          X^22 |____X____|-<--+      |
          X^21 |_________|    |      |
          X^20 |_________|    |      |
          X^19 |_________|    |      |
          X^18 |_________|    |      |
          X^17 |_________|    |      |
          X^16 |____X____|-<--+      |
          X^15 |_________|    |      |
          X^14 |_________|    |      |
          X^13 |_________|    |      |
          X^12 |____X____|-<--+      |
          X^11 |____X____|-<--+      |
          X^10 |____X____|-<--+      |
          X^ 9 |_________|    |      |
          X^ 8 |____X____|-<--+      |
          X^ 7 |____X____|-<--+      |
          X^ 6 |_________|    |      |
          X^ 5 |____X____|-<--+      |
          X^ 4 |____X____|-<--+      |
          X^ 3 |_________|    |      |
          X^ 2 |____X____|-<--+      |
          X^ 1 |____X____|-<--+      |
          X^ 0 |____X____|-<--+      |
                    ^         |      |
                    |         |      |
                    +------->-+      |
                    |                |
                    X-<--------------+
                    |
                input data

[Reference]
   1. Hashimoto Kiyoshi:"Jyouhou fugou riron nyumon",Morikita shuppan(1984)
      ISBN4-627-82050-X
   2. Miyagawa Hiroshi, Harajima Hiroshi, Imai Hideki:"Jyouhou to fugou no
      riron",Iwanami shoten(1983) ISBN4-00-010154