ASKSAGE: Sage Q&A Forum - RSS feed

Goppa Codes: Cannot create decoder with large parameters

Sat, 27 Mar 2021 14:54:58 +0100

Hello, I've been playing around with linear error-correcting codes with the intention of programming a demonstration of McEliece's Cryptosystem, which utilises Goppa codes. I've been following the format outlined in the SAGE tutorial. I can't post the link for some reason, but it can be found by searching "sagemath Goppa codes". I have no issue running the following code to create a Goppa code over field GF(2^4) and its corresponding decoder. F = GF(2^4) R. = F[] g = x^2+x+1 L = [a for a in F.list if g(a) != 0] C = codes.GoppaCode(g,L) D = C.decoder() However, the cryposystem I'm trying to program requires parameters higher than this, specifically over the field of GF(2^10). The code that I've used to create this Goppa code and its decoder are therefore as follows. F = GF(2^10) R. = F[] g = x^50+x^3+1 L = [a for a in F.list if g(a) != 0] C = codes.GoppaCode(g,L) D = C.decoder() This creates a [1024, 524] Goppa code over GF(2) and there's no issue in generating the code C, or even using it to encode a message vector (of length 524). Unfortunately, I seem to encounter a memory issue when I run the final line to create the decoder D. My initial response was to try it on a machine with larger memory (64GB), but I had no luck there. I assume it's because the code that I'm trying to create is pretty huge, but it would be really helpful if there's somehow a way around the problem to allow me implement this Goppa code and generate its decoder. Any help would be greatly appreciated and I'm happy to provide more details if needed. Thank you very much!!

Answer by dan_fulea for

Hello,

I've been playing around with linear error-correcting codes with the intention of programming a demonstration of McEliece's Cryptosystem, which utilises Goppa codes.

I've been following the format outlined in the SAGE tutorial. I can't post the link for some reason, but it can be found by searching "sagemath Goppa codes". I have no issue running the following code to create a Goppa code over field GF(2^4) and its corresponding decoder.

F = GF(2^4)
R.<x> = F[]
g = x^2+x+1
L = [a for a in F.list if g(a) != 0]
C = codes.GoppaCode(g,L)
D = C.decoder()

However, the cryposystem I'm trying to program requires parameters higher than this, specifically over the field of GF(2^10). The code that I've used to create this Goppa code and its decoder are therefore as follows.

F = GF(2^10)
R.<x> = F[]
g = x^50+x^3+1
L = [a for a in F.list if g(a) != 0]
C = codes.GoppaCode(g,L)
D = C.decoder()

This creates a [1024, 524] Goppa code over GF(2) and there's no issue in generating the code C, or even using it to encode a message vector (of length 524).

Unfortunately, I seem to encounter a memory issue when I run the final line to create the decoder D. My initial response was to try it on a machine with larger memory (64GB), but I had no luck there. I assume it's because the code that I'm trying to create is pretty huge, but it would be really helpful if there's somehow a way around the problem to allow me implement this Goppa code and generate its decoder.

Any help would be greatly appreciated and I'm happy to provide more details if needed.

Thank you very much!!

Thu, 01 Apr 2021 01:32:45 +0200

This is an answer showing why there is a memory issue while building the decoder for a bigger field $F$ of characteristic two and/or a higher degree of the involved polynomial $g\in \Bbb F_2[x]$. To illustrate this, let us consider first the "smaller" case from the question: F = GF(2^4) R. = F[] g = x^2 + x + 1 L = [a for a in F if g(a)] C = codes.GoppaCode(g, L) D = C.decoder() Then the object `D` has a lot of "hidden structure", for instance there is a dictionary `D.__dict__`, it has a lot of keys, sage: for key in D.__dict__: ....: print(key) ....: _maximum_error_weight decoder_type _code _input_space _connected_encoder_name _build_lookup_table _code_covering_radius _code_minimum_distance _decoder_type _lookup_table and for instance `D.__dict__['_lookup_table']` is again a dictionary, sage: print(type(D.__dict__['_lookup_table'])) it has "a lot of keys", translated into "a lot of values". We can count: sage: len(D.__dict__['_lookup_table']) 256 so there are $256$ keys, and some first few lookup translations are... sage: D.__dict__['_lookup_table'] {(0, 0, 0, 0, 0, 0, 0, 0): (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), (1, 0, 0, 0, 0, 0, 0, 0): (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), (0, 1, 1, 0, 0, 0, 1, 1): (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), (1, 1, 1, 0, 1, 1, 1, 1): (0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), (1, 1, 0, 1, 1, 1, 1, 0): (0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), (0, 1, 1, 0, 0, 1, 0, 1): (0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), (1, 0, 0, 1, 0, 1, 1, 0): (0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0), (0, 1, 1, 1, 0, 0, 0, 1): (0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0), (1, 1, 1, 0, 0, 0, 0, 1): (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), and so on, totally $256$ key-value pairs. If we try the same with a field $F=\Bbb F_{2^{10}}=$`GF(2^10)` and then further with a polynomial of degree $50$... well, there is no chance to build the corresponding structure. In such cases, maybe an *ad-hoc* solution should and may be given, try to see exactly what is needed from the decoder instance in the intended application. ---------- Some further notes. One can look into the code, e.g. via `C.decoder??` - and the code is mainly building... if decoder_name in self._registered_decoders: decClass = self._registered_decoders[decoder_name] try: return decClass(self, *args, **kwargs) where `self` becomes the instance `C`, and the decoder_name is the default decoder, sage: C._default_decoder_name 'Syndrome' So we can build the corresponding class, and ask for some information... sage: S = C._registered_decoders['Syndrome'] sage: S sage: S?? The double question mark gives... Init signature: S(code, maximum_error_weight=None) Docstring: Constructs a decoder for Linear Codes based on syndrome lookup table. The decoding algorithm works as follows: * First, a lookup table is built by computing the syndrome of every error pattern of weight up to "maximum_error_weight". * Then, whenever one tries to decode a word "r", the syndrome of "r" is computed. The corresponding error pattern is recovered from the pre-computed lookup table. * Finally, the recovered error pattern is subtracted from "r" to recover the original word. "maximum_error_weight" need never exceed the covering radius of the code, since there are then always lower-weight errors with the same syndrome. If one sets "maximum_error_weight" to a value greater than the covering radius, then the covering radius will be determined while building the lookup-table. This lower value is then returned if you query "decoding_radius" after construction. If "maximum_error_weight" is left unspecified or set to a number at least the covering radius of the code, this decoder is complete, i.e. it decodes every vector in the ambient space. Note: Constructing the lookup table takes time exponential in the length of the code and the size of the code's base field. Afterwards, the individual decodings are fast. INPUT: * "code" -- A code associated to this decoder * "maximum_error_weight" -- (default: "None") the maximum number of errors to look for when building the table. An error is raised if it is set greater than n-k, since this is an upper bound on the covering radius on any linear code. If "maximum_error_weight" is kept unspecified, it will be set to n - k, where n is the length of "code" and k its dimension. and a lot of further lines. So a possibility to force some build of a decoder would be... F = GF(2^10) R. = F[] g = x^50 + x^3 + 1 L = [a for a in F if g(a)] C = codes.GoppaCode(g, L) D = C.decoder(maximum_error_weight=1) Above, the option `maximum_error_weight=1` may make the decoder useless for a lot of purposes, but this shows that the code is building `D`. We have in the above case: sage: len(L) 1024 sage: len(D.__dict__['_lookup_table']) 1025 Using `maximum_error_weight=2` instead, after a looonger wait we get a decoder with... sage: len(D.__dict__['_lookup_table']) 524801

Comment by Student for

This is an answer showing why there is a memory issue while building the decoder for a bigger field $F$ of characteristic two and/or a higher degree of the involved polynomial $g\in \Bbb F_2[x]$.

To illustrate this, let us consider first the "smaller" case from the question:

F = GF(2^4)
R.<x> = F[]
g = x^2 + x + 1
L = [a for a in F if g(a)]
C = codes.GoppaCode(g, L)
D = C.decoder()

Then the object D has a lot of "hidden structure", for instance there is a dictionary D.__dict__, it has a lot of keys,

sage: for key in D.__dict__:
....:     print(key)
....: 
_maximum_error_weight
decoder_type
_code
_input_space
_connected_encoder_name
_build_lookup_table
_code_covering_radius
_code_minimum_distance
_decoder_type
_lookup_table

and for instance D.__dict__['_lookup_table'] is again a dictionary,

sage: print(type(D.__dict__['_lookup_table']))
<class 'dict'>

it has "a lot of keys", translated into "a lot of values". We can count:

sage: len(D.__dict__['_lookup_table'])
256

so there are $256$ keys, and some first few lookup translations are...

sage: D.__dict__['_lookup_table']
{(0, 0, 0, 0, 0, 0, 0, 0): (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
 (1, 0, 0, 0, 0, 0, 0, 0): (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
 (0, 1, 1, 0, 0, 0, 1, 1): (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
 (1, 1, 1, 0, 1, 1, 1, 1): (0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
 (1, 1, 0, 1, 1, 1, 1, 0): (0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
 (0, 1, 1, 0, 0, 1, 0, 1): (0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0),
 (1, 0, 0, 1, 0, 1, 1, 0): (0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0),
 (0, 1, 1, 1, 0, 0, 0, 1): (0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
 (1, 1, 1, 0, 0, 0, 0, 1): (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0),

and so on, totally $256$ key-value pairs.

If we try the same with a field $F=\Bbb F_{2^{10}}=$GF(2^10) and then further with a polynomial of degree $50$... well, there is no chance to build the corresponding structure. In such cases, maybe an ad-hoc solution should and may be given, try to see exactly what is needed from the decoder instance in the intended application.

Some further notes. One can look into the code, e.g. via C.decoder?? - and the code is mainly building...

    if decoder_name in self._registered_decoders:
        decClass = self._registered_decoders[decoder_name]
        try:
            return decClass(self, *args, **kwargs)

where self becomes the instance C, and the decoder_name is the default decoder,

sage: C._default_decoder_name
'Syndrome'

So we can build the corresponding class, and ask for some information...

sage: S = C._registered_decoders['Syndrome']
sage: S
<class 'sage.coding.linear_code.LinearCodeSyndromeDecoder'>
sage: S??

The double question mark gives...

Init signature: S(code, maximum_error_weight=None)
Docstring:     
   Constructs a decoder for Linear Codes based on syndrome lookup
   table.

   The decoding algorithm works as follows:

   * First, a lookup table is built by computing the syndrome of every
     error pattern of weight up to "maximum_error_weight".

   * Then, whenever one tries to decode a word "r", the syndrome of
     "r" is computed. The corresponding error pattern is recovered
     from the pre-computed lookup table.

   * Finally, the recovered error pattern is subtracted from "r" to
     recover the original word.

   "maximum_error_weight" need never exceed the covering radius of the
   code, since there are then always lower-weight errors with the same
   syndrome. If one sets "maximum_error_weight" to a value greater
   than the covering radius, then the covering radius will be
   determined while building the lookup-table. This lower value is
   then returned if you query "decoding_radius" after construction.

   If "maximum_error_weight" is left unspecified or set to a number at
   least the covering radius of the code, this decoder is complete,
   i.e. it decodes every vector in the ambient space.

   Note:

     Constructing the lookup table takes time exponential in the
     length of the code and the size of the code's base field.
     Afterwards, the individual decodings are fast.

   INPUT:

   * "code" -- A code associated to this decoder

   * "maximum_error_weight" -- (default: "None") the maximum number of
     errors to look for when building the table. An error is raised if
     it is set greater than n-k, since this is an upper bound on the
     covering radius on any linear code. If "maximum_error_weight" is
     kept unspecified, it will be set to n - k, where n is the length
     of "code" and k its dimension.

and a lot of further lines.

So a possibility to force some build of a decoder would be...

F = GF(2^10)
R.<x> = F[]
g = x^50 + x^3 + 1
L = [a for a in F if g(a)]
C = codes.GoppaCode(g, L)
D = C.decoder(maximum_error_weight=1)

Above, the option maximum_error_weight=1 may make the decoder useless for a lot of purposes, but this shows that the code is building D. We have in the above case:

sage: len(L)
1024
sage: len(D.__dict__['_lookup_table'])
1025

Using maximum_error_weight=2 instead, after a looonger wait we get a decoder with...

sage: len(D.__dict__['_lookup_table'])
524801

Fri, 02 Apr 2021 15:16:08 +0200

Thanks a lot for this, it's really helped shed light on the problem! It looks as though the decoder just creates a lookup table for all the possible syndromes and their corresponding codewords, so there's no surprise I run out of memory. Unless there's a more efficient syndrome decoder out there that I'm unaware of, it looks as though creating a decoder with these parameters is out of the question then, which is a shame. This has been very helpful though, so thank you!