NAME

libPARI - Programming PARI in Library Mode


DESCRIPTION

The User's Guide to Pari/GP gives in three chapters a general presentation of the system, of the gp calculator, and detailed explanation of high level PARI routines available through the calculator. The present manual assumes general familiarity with the contents of these chapters and the basics of ANSI C programming, and focuses on the usage of the PARI library. In this chapter, we introduce the general concepts of PARI programming and describe useful general purpose functions. Chapter 5 describes all available public low-level functions.


Introduction: initializations, universal objects

To use PARI in library mode, you must write a C program and link it to the PARI library. See the installation guide or the Appendix to the User's Guide to Pari/GP on how to create and install the library and include files. A sample Makefile is presented in Appendix A, and a more elaborate one in examples/Makefile. The best way to understand how programming is done is to work through a complete example. We will write such a program in Label se:prog. Before doing this, a few explanations are in order.

First, one must explain to the outside world what kind of objects and routines we are going to use. This is done with the directive

  #include <pari.h>

In particular, this header defines the fundamental type for all PARI objects: the type GEN, which is simply a pointer to long.

Before any PARI routine is called, one must initialize the system, and in particular the PARI stack which is both a scratchboard and a repository for computed objects. This is done with a call to the function

void pari_init(size_t size, ulong maxprime)

The first argument is the number of bytes given to PARI to work with, and the second is the upper limit on a precomputed prime number table; size should not reasonably be taken below 500000 but you may set maxprime = 0, although the system still needs to precompute all primes up to about 2^{16}.

We have now at our disposal:

* a PARI stack containing nothing. It is a big connected chunk of size bytes of memory. All your computations take place here. In large computations, unwanted intermediate results quickly clutter up memory so some kind of garbage collecting is needed. Most large systems do garbage collecting when the memory is getting scarce, and this slows down the performance. PARI takes a different approach: you must do your own cleaning up when the intermediate results are not needed anymore. Special purpose routines have been written to do this; we will see later how (and when) you should use them.

* the following universal objects (by definition, objects which do not belong to the stack): the integers 0, 1, -1 and 2 (respectively called gen_0, gen_1, gen_m1 and gen_2), the fraction (1)/(2) (ghalf), the complex number i (gi). All of these are of type GEN.

In addition, space is reserved for the polynomials x_v (pol_x[v]), and the polynomials 1_v (pol_1[v]). Here, x_v is the name of variable number v, where 0 <= v <= X<MAXVARN>MAXVARN. Both pol_1 and pol_x are arrays of GENs, the index being the polynomial variable number.

However, except for the ones corresponding to variables 0 and MAXVARN, these polynomials are not created upon initialization. It is the programmer's responsibility to fill them before use. We will see how this is done in Label se:vars (never through direct assignment).

* a heap which is just a linked list of permanent universal objects. For now, it contains exactly the ones listed above. You will probably very rarely use the heap yourself; and if so, only as a collection of copies of objects taken from the stack (called clones in the sequel). Thus you need not bother with its internal structure, which may change as PARI evolves. Some complex PARI functions create clones for special garbage collecting purposes, usually destroying them when returning.

* a table of primes (in fact of differences between consecutive primes), called diffptr, of type byteptr (pointer to unsigned char). Its use is described in appendix B.

* access to all the built-in functions of the PARI library. These are declared to the outside world when you include pari.h, but need the above things to function properly. So if you forget the call to pari_init, you will get a fatal error when running your program.


Important technical notes

Types.

Although PARI objects all have the C type GEN, we will freely use the word type to refer to PARI dynamic subtypes: t_INT, t_REAL, etc. The declaration

    GEN x;

declares a C variable of type GEN, but its ``value'' will be said to have type t_INT, t_REAL, etc. The meaning should always be clear from the context.

Type recursivity.

Conceptually, most PARI types are recursive. But the GEN type is a pointer to long, not to GEN. So special macros must be used to access GEN's components. The simplest one is gel(V, i), where el stands for element, to access component number i of the GEN V. This is a valid lvalue (may be put on the left side of an assignment), and the following two constructions are exceedingly frequent

    gel(V, i) = x;
    x = gel(V, i);

where x and V are GENs. This macro accesses and modifies directly the components of V and do not create a copy of the coefficient, contrary to all the library functions.

More generally, to retrieve the values of elements of lists of...of lists of vectors we have the gmael macros (for multidimensional array element). The syntax is gmaeln(V,a_1,...,a_n), where V is a GEN, the a_i are indexes, and n is an integer between 1 and 5. This stands for x[a_1][a_2]...[a_n], and returns a GEN. The macros gel (resp. gmael) are synonyms for gmael1 (resp. gmael2).

Finally, the macro gcoeff(M, i, j) has exactly the meaning of M[i,j] in GP when M is a matrix. Note that due to the implementation of t_MATs as horizontal lists of vertical vectors, gcoeff(x,y) is actually equivalent to gmael(y,x). One should use gcoeff in matrix context, and gmael otherwise.

Variations on basic functions

In the library syntax descriptions in Chapter 3, we have only given the basic names of the functions. For example gadd(x,y) assumes that x and y are GENs, and creates the result x+y on the PARI stack. For most of the basic operators and functions, many other variants are available. We give some examples for gadd, but the same is true for all the basic operators, as well as for some simple common functions (a complete list is given in Chapter 5):

GEN gaddgs(GEN x, long y)

GEN gaddsg(long x, GEN y)

In the following three, z is a preexisting GEN and the result of the corresponding operation is put into z. The size of the PARI stack does not change:

void gaddz(GEN x, GEN y, GEN z)

void gaddgsz(GEN x, long y, GEN z)

void gaddsgz(GEN x, GEN y, GEN z)

There are also low level functions which are special cases of the above:

GEN addii(GEN x, GEN y): here x and y are GENs of type t_INT (this is not checked).

GEN addrr(GEN x, GEN y): here x and y are GENs of type t_REAL (this is not checked).

There also exist functions addir, addri, mpadd (whose two arguments can be of type t_INT or t_REAL), addis (to add a t_INT and a long) and so on.

All these specialized functions are of course more efficient than the general purpose ones, but note the hidden danger here: the types of the objects involved, if they are themselves results of a previous computation, are not completely predetermined. For instance the multiplication of a t_REAL by a t_INT usually gives a t_REAL result, except when the integer is 0, in which case according to the PARI philosophy the result is the exact integer 0. Hence if afterwards you call a function which specifically needs a t_REAL argument, you are in trouble.

The names are self-explanatory once you know that i stands for a t_INT, r for a t_REAL, mp for i or r, s for a signed C long integer, u for an unsigned C long integer; finally the suffix z means that the result is not created on the PARI stack but assigned to a preexisting GEN object passed as an extra argument. For completeness, Chapter 5 gives a description of these low-level functions.

Portability: 32-bit / 64-bit architectures.

PARI supports both 32-bit and 64-bit based machines, but not simultaneously! The library will have been compiled assuming a given architecture, and some of the header files you include (through pari.h) will have been modified to match the library.

Portable macros are defined to bypass most machine dependencies. If you want your programs to run identically on 32-bit and 64-bit machines, you have to use these, and not the corresponding numeric values, whenever the precise size of your long integers might matter. Here are the most important ones:

  64-bit    32-bit  X<BITS_IN_LONG>C<BITS_IN_LONG>    64        32 
X<LONG_IS_64BIT>C<LONG_IS_64BIT>   defined   undefined 
X<DEFAULTPREC>C<DEFAULTPREC>     3         4   (C< ~ > 19 decimal digits, 
see formula below) 
X<MEDDEFAULTPREC>C<MEDDEFAULTPREC>  4         6   (C< ~ > 38 decimal digits) 
X<BIGDEFAULTPREC>C<BIGDEFAULTPREC>  5         8   (C< ~ > 57 decimal digits) 
For instance, suppose you call a transcendental function, such as

GEN gexp(GEN x, long prec).

The last argument prec is only used if x is an exact object, otherwise the relative precision is determined by the precision of x. But since prec sets the size of the inexact result counted in (long) words (including codewords), the same value of prec will yield different results on 32-bit and 64-bit machines. Real numbers have two codewords (see Label se:impl), so the formula for computing the bit accuracy is

   bit_accuracy(prec) = (prec - 2) * X<BITS_IN_LONG>BITS_IN_LONG

(this is actually the definition of a macro). The corresponding accuracy expressed in decimal digits would be

   bit_accuracy(prec) * log (2) / log (10).

For example if the value of prec is 5, the corresponding accuracy for 32-bit machines is (5-2)* log (2^{32})/ log (10) ~ 28 decimal digits, while for 64-bit machines it is (5-2)* log (2^{64})/ log (10) ~ 57 decimal digits.

Thus, you must take care to change the prec parameter you are supplying according to the bit size, either using the default precisions given by the various DEFAULTPRECs, or by using conditional constructs of the form:

  #ifndef LONG_IS_64BIT
    prec = 4;
  #else
    prec = 6;
  #endif

which is in this case equivalent to the statement prec = MEDDEFAULTPREC;.

Note that for parity reasons, half the accuracies available on 32-bit architectures (the odd ones) have no precise equivalents on 64-bit machines.


Garbage collection

Why and how.

As we have seen, the pari_init routine allocates a big range of addresses, the stack, that are going to be used throughout. Recall that all PARI objects are pointers. Except for a few universal objects, they all point at some part of the stack.

The stack starts at the address bot and ends just before top. This means that the quantity

   (top - bot)/sizeof(long)

is (roughly) equal to the size argument of pari_init. The PARI stack also has a ``current stack pointer'' called avma, which stands for available memory address. These three variables are global (declared by pari.h). They are of type pari_sp, which means pari stack pointer.

The stack is oriented upside-down: the more recent an object, the closer to bot. Accordingly, initially avma = top, and avma gets decremented as new objects are created. As its name indicates, avma always points just after the first free address on the stack, and (GEN)avma is always (a pointer to) the latest created object. When avma reaches bot, the stack overflows, aborting all computations, and an error message is issued. To avoid this you need to clean up the stack from time to time, when intermediate objects are not needed anymore. This is called ``garbage collecting.''

We are now going to describe briefly how this is done. We will see many concrete examples in the next subsection.

* First, PARI routines do their own garbage collecting, which means that whenever a documented function from the library returns, only its result(s) have been added to the stack (non-documented ones may not do this). In particular, a PARI function that does not return a GEN does not clutter the stack. Thus, if your computation is small enough (e.g. you call few PARI routines, or most of them return long integers), then you do not need to do any garbage collecting. This is probably the case in many of your subroutines. Of course the objects that were on the stack before the function call are left alone. Except for the ones listed below, PARI functions only collect their own garbage.

* It may happen that all objects that were created after a certain point can be deleted --- for instance, if the final result you need is not a GEN, or if some search proved futile. Then, it is enough to record the value of avma just before the first garbage is created, and restore it upon exit:

  pari_sp av = avma; /* record initial avma */
  garbage ...
  avma = av; /* restore it */

All objects created in the garbage zone will eventually be overwritten: they should not be accessed anymore once avma has been restored.

* If you want to destroy (i.e. give back the memory occupied by) the latest PARI object on the stack (e.g. the latest one obtained from a function call), you can use the function

void cgiv(GEN z)

where z is the object you want to give back. This is equivalent to the above where the initial av is computed from z.

* Unfortunately life is not so simple, and sometimes you will want to give back accumulated garbage during a computation without losing recent data. For this you need the gerepile function (or one of its simpler variants described hereafter):

GEN gerepile(pari_sp ltop, pari_sp lbot, GEN q)

This function cleans up the stack between ltop and lbot, where lbot < ltop, and returns the updated object q. This means:

1) we translate (copy) all the objects in the interval [avma, lbot[, so that its right extremity abuts the address ltop. Graphically

               bot           avma   lbot          ltop     top
  End of stack  |-------------[++++++[-/-/-/-/-/-/-|++++++++|  Start
                  free memory            garbage

becomes:

               bot                         avma   ltop     top
  End of stack  |---------------------------[++++++[++++++++|  Start
                         free memory

where ++ denote significant objects, -- the unused part of the stack, and -/- the garbage we remove.

2) The function then inspects all the PARI objects between avma and lbot (i.e. the ones that we want to keep and that have been translated) and looks at every component of such an object which is not a codeword. Each such component is a pointer to an object whose address is either

--- between avma and lbot, in which case it is suitably updated,

--- larger than or equal to ltop, in which case it does not change, or

--- between lbot and ltop in which case gerepile raises an error (``significant pointers lost in gerepile'').

3) avma is updated (we add ltop - lbot to the old value).

4) We return the (possibly updated) object q: if q initially pointed between avma and lbot, we return the updated address, as in 2). If not, the original address is still valid, and is returned!

As stated above, no component of the remaining objects (in particular q) should belong to the erased segment [lbot, ltop[, and this is checked within gerepile. But beware as well that the addresses of the objects in the translated zone change after a call to gerepile, so you must not access any pointer which previously pointed into the zone below ltop. If you need to recover more than one object, use one of the gerepilemany functions below.

As a consequence of the preceding explanation, if a PARI object is to be relocated by gerepile then, apart from universal objects, the chunks of memory used by its components should be in consecutive memory locations. All GENs created by documented PARI functions are guaranteed to satisfy this. This is because the gerepile function knows only about two connected zones: the garbage that is erased (between lbot and ltop) and the significant pointers that are copied and updated. If there is garbage interspersed with your objects, disaster occurs when we try to update them and consider the corresponding ``pointers''. In most cases of course the said garbage is in fact a bunch of other GENs, in which case we simply waste time copying and updating them for nothing. But be wary when you allow objects to become disconnected.

In practice this is achieved by the following programming idiom:

    ltop = avma; garbage(); lbot = avma; q = anything();
    return gerepile(ltop, lbot, q); /* returns the updated q */

Beware that

    ltop = avma; garbage();
    return gerepile(ltop, avma, anything())

might work, but should be frowned upon. We cannot predict whether avma is evaluated after or before the call to anything(): it depends on the compiler. If we are out of luck, it is after the call, so the result belongs to the garbage zone and the gerepile statement becomes equivalent to avma = ltop. Thus we return a pointer to random garbage.

* A simple variant is

GEN gerepileupto(pari_sp ltop, GEN q)

which cleans the stack between ltop and the connected object q and returns q updated. For this to work, q must have been created before all its components, otherwise they would belong to the garbage zone! Unless mentioned otherwise, documented PARI functions guarantee this.

* Another variant (a special case of gerepileall below, where n = 1) is

GEN gerepilecopy(pari_sp ltop, GEN x))

which is functionally equivalent to gerepileupto(ltop, gcopy(x)) but more efficient. In this case, the GEN parameter x need not satisfy any property before the garbage collection (it may be disconnected, components created before the root and so on). Of course, this is less efficient than either gerepileupto or gerepile, because x has to be copied to a clean stack zone first.

* To cope with complicated cases where many objects have to be preserved, you can use

void gerepileall(pari_sp ltop, int n, ...)

where the routine expects n further arguments, which are the addresses of the GENs you want to preserve. It cleans up the most recent part of the stack (between ltop and avma), updating all the GENs added to the argument list. A copy is done just before the cleaning to preserve them, so they do not need to be connected before the call. With gerepilecopy, this is the most robust of the gerepile functions (the less prone to user error), hence the slowest.

An alternative syntax, obsolete but kept for backward compatibility, is given by

void gerepilemany(pari_sp ltop, GEN *gptr[], int n)

which works exactly as above, except that the preserved GENs are the elements of the array gptr (of length n): gptr[0], gptr[1],..., gptr[n-1].

* More efficient, but tricky to use is

void gerepilemanysp(pari_sp ltop, pari_sp lbot, GEN *gptr[], int n)

which cleans the stack between lbot and ltop and updates the GENs pointed at by the elements of gptr without doing any copying. This is subject to the same restrictions as gerepile, the only difference being that more than one address gets updated.

Examples.

gerepile
Let x and y be two preexisting PARI objects and suppose that we want to compute x^2 + y^2. This is done using the following program:
    GEN p1 = gsqr(x);
    GEN p2 = gsqr(y), z = gadd(p1,p2);

The GEN z indeed points at the desired quantity. However, consider the stack: it contains as unnecessary garbage p1 and p2. More precisely it contains (in this order) z, p2, p1. (Recall that, since the stack grows downward from the top, the most recent object comes first.)

It is not possible to get rid of p1, p2 before z is computed, since they are used in the final operation. We cannot record avma before p1 is computed and restore it later, since this would destroy z as well. It is not possible either to use the function cgiv since p1 and p2 are not at the bottom of the stack and we do not want to give back z.

But using gerepile, we can give back the memory locations corresponding to p1, p2, and move the object z upwards so that no space is lost. Specifically:

    pari_sp ltop = avma;  /* remember the current address of the top of the stack */
    GEN p1 = gsqr(x);
    GEN p2 = gsqr(y);
    pari_sp lbot = avma;  /* keep the address of the bottom of the garbage pile */
    GEN z = gadd(p1, p2); /* z is now the last object on the stack */
    z = gerepile(ltop, lbot, z);

Of course, the last two instructions could also have been written more simply:

    z = gerepile(ltop, lbot, gadd(p1,p2));

In fact gerepileupto is even simpler to use, because the result of gadd is the last object on the stack and gadd is guaranteed to return an object suitable for gerepileupto:

    ltop = avma;
    z = gerepileupto(ltop, gadd(gsqr(x), gsqr(y)));

Make sure you understand exactly what has happened before you go on (use the figure from the preceding section).

Remark on assignments and gerepile: When the tree structure and the size of the PARI objects which will appear in a computation are under control, one may allocate sufficiently large objects at the beginning, use assignment statements, then simply restore avma. Coming back to the above example, note that if we know that x and y are of type real fitting into DEFAULTPREC words, we can program without using gerepile at all:

    z = cgetr(DEFAULTPREC); ltop = avma;
    gaffect(gadd(gsqr(x), gsqr(y)), z);
    avma = ltop;

This is often slower than a craftily used gerepile though, and certainly more cumbersome to use. As a rule, assignment statements should generally be avoided.

Variations on a theme: it is often necessary to do several gerepiles during a computation. However, the fewer the better. The only condition for gerepile to work is that the garbage be connected. If the computation can be arranged so that there is a minimal number of connected pieces of garbage, then it should be done that way.

For example suppose we want to write a function of two GEN variables x and y which creates the vector [x^2+y, y^2+x]. Without garbage collecting, one would write:

    p1 = gsqr(x); p2 = gadd(p1, y);
    p3 = gsqr(y); p4 = gadd(p3, x); z = cgetg(3, t_VEC);
    gel(z, 1) = p2;
    gel(z, 2) = p4;

This leaves a dirty stack containing (in this order) z, p4, p3, p2, p1. The garbage here consists of p1 and p3, which are separated by p2. But if we compute p3 before p2 then the garbage becomes connected, and we get the following program with garbage collecting:

    ltop = avma; p1 = gsqr(x); p3 = gsqr(y);
    lbot = avma; z = cgetg(3, t_VEC);
    gel(z, 1) = gadd(p1,y);
    gel(z, 2) = gadd(p3,x); z = gerepile(ltop,lbot,z);

Finishing by z = gerepileupto(ltop, z) would be ok as well. Beware that

    ltop = avma; p1 = gadd(gsqr(x), y); p3 = gadd(gsqr(y), x);
    z = cgetg(3, t_VEC);
    gel(z, 1) = p1;
    gel(z, 2) = p3; z = gerepileupto(ltop,z); /* WRONG */

is a disaster since p1 and p3 are created before z, so the call to gerepileupto overwrites them, leaving gel(z, 1) and gel(z, 2) pointing at random data! On the other hand

    ltop = avma; z = cgetg(3, t_VEC);
    gel(z, 1) = gadd(gsqr(x), y);
    gel(z, 2) = gadd(gsqr(y), x); z = gerepileupto(ltop,z); /* INEFFICIENT */

leaves the results of gsqr(x) and gsqr(y) on the stack (and lets gerepileupto update them for naught). Finally, the most elegant and efficient version (with respect to time and memory use) is as follows

    z = cgetg(3, t_VEC);
    ltop = avma; gel(z, 1) = gerepileupto(ltop, gadd(gsqr(x), y));
    ltop = avma; gel(z, 2) = gerepileupto(ltop, gadd(gsqr(y), x));

which avoids updating the container z and cleans up its components individually, as soon as they are computed.

One last example. Let us compute the product of two complex numbers x and y, using the 3M method which requires 3 multiplications instead of the obvious 4. Let z = x*y, and set x = x_r + i*x_i and similarly for y and z. We compute p_1 = x_r*y_r, p_2 = x_i*y_i, p_3 = (x_r+x_i)*(y_r+y_i), and then we have z_r = p_1-p_2, z_i = p_3-(p_1+p_2). The program is as follows:

  ltop = avma;
  p1 = gmul(gel(x,1), gel(y,1));
  p2 = gmul(gel(x,2), gel(y,2));
  p3 = gmul(gadd(gel(x,1), gel(x,2)), gadd(gel(y,1), gel(y,2)));
  p4 = gadd(p1,p2);
  lbot = avma; z = cgetg(3, t_COMPLEX);
  gel(z, 1) = gsub(p1,p2);
  gel(z, 2) = gsub(p3,p4); z = gerepile(ltop,lbot,z);

Exercise. Write a function which multiplies a matrix by a column vector. Hint: start with a cgetg of the result, and use gerepile whenever a coefficient of the result vector is computed. You can look at the answer in src/basemath/gen1.c:MC_mul().

gerepileall
Let us now see why we may need the gerepileall variants. Although it is not an infrequent occurrence, we do not give a specific example but a general one: suppose that we want to do a computation (usually inside a larger function) producing more than one PARI object as a result, say two for instance. Then even if we set up the work properly, before cleaning up we have a stack which has the desired results z1, z2 (say), and then connected garbage from lbot to ltop. If we write
    z1 = gerepile(ltop, lbot, z1);

then the stack is cleaned, the pointers fixed up, but we have lost the address of z2. This is where we need the gerepileall function:

    gerepileall(ltop, 2, &z1, &z2)

copies z1 and z2 to new locations, cleans the stack from ltop to the old avma, and updates the pointers z1 and z2. Here we do not assume anything about the stack: the garbage can be disconnected and z1, z2 need not be at the bottom of the stack. If all of these assumptions are in fact satisfied, then we can call gerepilemanysp instead, which is usually faster since we do not need the initial copy (on the other hand, it is less cache friendly).

A most important usage is ``random'' garbage collection during loops whose size requirements we cannot (or do not bother to) control in advance:

    pari_sp ltop = avma, limit = stack_lim(avma, 1);
    GEN x, y;
    while (...)
    {
      garbage(); x = anything();
      garbage(); y = anything(); garbage();
      if (avma < limit) /* memory is running low (half spent since entry) */
        gerepileall(ltop, 2, &x, &y);
    }

Here we assume that only x and y are needed from one iteration to the next. As it would be costly to call gerepile once for each iteration, we only do it when it seems to have become necessary. The macro stack_lim(avma,n) denotes an address where 2^{n-1} / (2^{n-1}+1) of the remaining stack space is exhausted (1/2 for n = 1, 2/3 for n = 2).

Comments.

First, gerepile has turned out to be a flexible and fast garbage collector for number-theoretic computations, which compares favorably with more sophisticated methods used in other systems. Our benchmarks indicate that the price paid for using gerepile and gerepile-related copies, when properly used, is usually less than 1 percent of the total running time, which is quite acceptable!

Second, it is of course harder on the programmer, and quite error-prone if you do not stick to a consistent PARI programming style. If all seems lost, just use gerepilecopy (or gerepileall) to fix up the stack for you. You can always optimize later when you have sorted out exactly which routines are crucial and what objects need to be preserved and their usual sizes.

If you followed us this far, congratulations, and rejoice: the rest is much easier.


Creation of PARI objects, assignments, conversions

Creation of PARI objects

The basic function which creates a PARI object is the function cgetg whose prototype is:

GEN cgetg(long length, long type).

Here length specifies the number of longwords to be allocated to the object, and type is the type number of the object, preferably in symbolic form (see Label se:impl for the list of these). The precise effect of this function is as follows: it first creates on the PARI stack a chunk of memory of size length longwords, and saves the address of the chunk which it will in the end return. If the stack has been used up, a message to the effect that ``the PARI stack overflows'' is printed, and an error raised. Otherwise, it sets the type and length of the PARI object. In effect, it fills its first codeword (z[0] or *z). Many PARI objects also have a second codeword (types t_INT, t_REAL, t_PADIC, t_POL, and t_SER). In case you want to produce one of those from scratch, which should be exceedingly rare, it is your responsibility to fill this second codeword, either explicitly (using the macros described in Label se:impl), or implicitly using an assignment statement (using gaffect).

Note that the argument length is predetermined for a number of types: 3 for types t_INTMOD, t_FRAC, t_COMPLEX, t_POLMOD, t_RFRAC, 4 for type t_QUAD and t_QFI, and 5 for type t_PADIC and t_QFR. However for the sake of efficiency, no checking is done in the function cgetg, so disasters will occur if you give an incorrect length.

Notes: 1) The main use of this function is create efficiently a constant object, or to prepare for later assignments (see Label se:assign). Most of the time you will use GEN objects as they are created and returned by PARI functions. In this case you do not need to use cgetg to create space to hold them.

2) For the creation of leaves, i.e. t_INT or t_REAL,

GEN cgeti(long length)

GEN cgetr(long length)

should be used instead of cgetg(length, t_INT) and cgetg(length, t_REAL) respectively. Finally

GEN cgetc(long prec)

creates a t_COMPLEX whose real and imaginary part are t_REALs allocated by cgetr(prec).

Examples: 1) Both z = cgeti(DEFAULTPREC) and cgetg(DEFAULTPREC, t_INT) create a t_INT whose ``precision'' is bit_accuracy(DEFAULTPREC) = 64. This means z can hold rational integers of absolute value less than 2^{64}. Note that in both cases, the second codeword is not filled. Of course we could use numerical values, e.g. cgeti(4), but this would have different meanings on different machines as bit_accuracy(4) equals 64 on 32-bit machines, but 128 on 64-bit machines.

2) The following creates a complex number whose real and imaginary parts can hold real numbers of precision bit_accuracy(MEDDEFAULTPREC) = 96 bits:

    z = cgetg(3, t_COMPLEX);
    gel(z, 1) = cgetr(MEDDEFAULTPREC);
    gel(z, 2) = cgetr(MEDDEFAULTPREC);

or simply z = cgetc(MEDDEFAULTPREC).

3) To create a matrix object for 4 x 3 matrices:

    z = cgetg(4, t_MAT);
    for(i=1; i<4; i++) gel(z, i) = cgetg(5, t_COL);

If one wishes to create space for the matrix elements themselves, one has to follow this with a double loop to fill each column vector.

These last two examples illustrate the fact that since PARI types are recursive, all the branches of the tree must be created. The function cgetg creates only the ``root'', and other calls to cgetg must be made to produce the whole tree. For matrices, a common mistake is to think that z = cgetg(4, t_MAT) (for example) creates the root of the matrix: one needs also to create the column vectors of the matrix (obviously, since we specified only one dimension in the first cgetg!). This is because a matrix is really just a row vector of column vectors (hence a priori not a basic type), but it has been given a special type number so that operations with matrices become possible.

Finally, to facilitate input of constant objects when speed is not paramount, there are four varargs functions:

GEN mkintn(long n, ...) returns the non-negative t_INT whose development in base 2^{32} is given by the following n words (unsigned long). It is assumed that all such arguments are less than 2^{32} (the actual sizeof(long) is irrelevant, the behaviour is also as above on 64-bit machines).

    mkintn(3, a2, a1, a0);

returns a_2 2^{64} + a_1 2^{32} + a_0.

GEN mkpoln(long n, ...) Returns the t_POL whose n coefficients (GEN) follow, in order of decreasing degree.

    mkpoln(3, gen_1, gen_2, gen_0);

returns the polynomial X^2 + 2X (in variable 0, use setvarn if you want other variable numbers). Beware that n is the number of coefficients, hence one more than the degree.

GEN mkvecn(long n, ...) returns the t_VEC whose n coefficients (GEN) follow.

GEN mkcoln(long n, ...) returns the t_COL whose n coefficients (GEN) follow.

Warning: Contrary to the policy of general PARI functions, the latter three functions do not copy their arguments, nor do they produce an object a priori suitable for gerepileupto. For instance

    /* gerepile-safe: components are universal objects */
    z = mkvecn(3, gen_1, gen_0, gen_2);
    /* not OK for gerepileupto: stoi(3) creates component before root */
    z = mkvecn(3, stoi(3), gen_0, gen_2);
    /* NO! First vector component C<x> is destroyed */
    x = gclone(gen_1);
    z = mkvecn(3, x, gen_0, gen_2);
    gunclone(x);

The following function is also available as a special case of mkintn:

GEN u2toi(ulong a, ulong b)

Returns the GEN equal to 2^{32} a + b, assuming that a,b < 2^{32}. This does not depend on sizeof(long): the behaviour is as above on both 32 and 64-bit machines.

Assignments.

Firstly, if x and y are both declared as GEN (i.e. pointers to something), the ordinary C assignment y = x makes perfect sense: we are just moving a pointer around. However, physically modifying either x or y (for instance, x[1] = 0) also changes the other one, which is usually not desirable.

Very important note: Using the functions described in this paragraph is inefficient and often awkward: one of the gerepile functions (see Label se:garbage) should be preferred. See the paragraph end for one exception to this rule.

The general PARI assignment function is the function gaffect with the following syntax:

void gaffect(GEN x, GEN y)

Its effect is to assign the PARI object x into the preexisting object y. This copies the whole structure of x into y so many conditions must be met for the assignment to be possible. For instance it is allowed to assign a t_INT into a t_REAL, but the converse is forbidden. For that, you must use the truncation or rounding function of your choice (see section 3.2). It can also happen that y is not large enough or does not have the proper tree structure to receive the object x. For instance, let y the zero integer with length equal to 2; then y is too small to accommodate any non-zero t_INT. In general common sense tells you what is possible, keeping in mind the PARI philosophy which says that if it makes sense it is valid. For instance, the assignment of an imprecise object into a precise one does not make sense. However, a change in precision of imprecise objects is allowed.

All functions ending in ``z'' such as gaddz (see Label se:low_level) implicitly use this function. In fact what they exactly do is record {avma} (see Label se:garbage), perform the required operation, gaffect the result to the last operand, then restore the initial avma.

You can assign ordinary C long integers into a PARI object (not necessarily of type t_INT). Use the function gaffsg with the following syntax:

void gaffsg(long s, GEN y)

Note: due to the requirements mentioned above, it is usually a bad idea to use gaffect statements. There is one exception: for simple objects (e.g. leaves) whose size is controlled, they can be easier to use than gerepile, and about as efficient.

Coercion. It is often useful to coerce an inexact object to a given precision. For instance at the beginning of a routine where precision can be kept to a minimum; otherwise the precision of the input is used in all subsequent computations, which is inefficient if the latter is known to thousands of digits. One may use the gaffect function for this, but it is easier and more efficient to call

GEN gtofp(GEN x, long prec) converts the complex number x (t_INT, t_REAL, t_FRAC, t_QUAD or t_COMPLEX) to either a t_REAL or t_COMPLEX whose components are t_REAL of length prec.

Copy.

It is also very useful to copy a PARI object, not just by moving around a pointer as in the y = x example, but by creating a copy of the whole tree structure, without pre-allocating a possibly complicated y to use with gaffect. The function which does this is called gcopy. Its syntax is:

GEN gcopy(GEN x)

and the effect is to create a new copy of x on the PARI stack.

Sometimes, on the contrary, a quick copy of the skeleton of x is enough, leaving pointers to the original data in x for the sake of speed instead of making a full recursive copy. Use GEN shallowcopy(GEN x) for this. Note that the result is not suitable for gerepileupto !

Make sure at this point that you understand the difference between y = x, y = gcopy(x), y = shallowcopy(x) and gaffect(x,y).

Clones

Sometimes, it is more efficient to create a persistent copy of a PARI object. This is not created on the stack but on the heap, hence unaffected by gerepile and friends. The function which does this is called gclone. Its syntax is:

GEN gclone(GEN x)

A clone can be removed from the heap (thus destroyed) using

void gunclone(GEN x)

No PARI object should keep references to a clone which has been destroyed!

Conversions

The following functions convert C objects to PARI objects (creating them on the stack as usual):

GEN stoi(long s): C long integer (``small'') to t_INT.

GEN dbltor(double s): C double to t_REAL. The accuracy of the result is 19 decimal digits, i.e. a type t_REAL of length DEFAULTPREC, although on 32-bit machines only 16 of them are significant.

We also have the converse functions:

long itos(GEN x): x must be of type t_INT,

double rtodbl(GEN x): x must be of type t_REAL,

as well as the more general ones:

long gtolong(GEN x),

double gtodouble(GEN x).


Implementation of the PARI types

We now go through each type and explain its implementation. Let z be a GEN, pointing at a PARI object. In the following paragraphs, we will constantly mix two points of view: on the one hand, z is treated as the C pointer it is, on the other, as PARI's handle on some mathematical entity, so we will shamelessly write z != 0 to indicate that the value thus represented is nonzero (in which case the pointer z is certainly non-NULL). We offer no apologies for this style. In fact, you had better feel comfortable juggling both views simultaneously in your mind if you want to write correct PARI programs.

Common to all the types is the first codeword z[0], which we do not have to worry about since this is taken care of by cgetg. Its precise structure depends on the machine you are using, but it always contain the following data: the internal type number associated to the symbolic type name, the length of the root in longwords, and a technical bit which indicates whether the object is a clone or not (see Label se:clone). This last one is used by gp for internal garbage collecting, you will not have to worry about it.

These data can be handled through the following macros:

long typ(GEN z) returns the type number of z.

void settyp(GEN z, long n) sets the type number of z to n (you should not have to use this function if you use cgetg).

long lg(GEN z) returns the length (in longwords) of the root of z.

long setlg(GEN z, long l) sets the length of z to l (you should not have to use this function if you use cgetg; however, see an advanced example in Label se:prog).

long isclone(GEN z) is z a clone?

void setisclone(GEN z) sets the clone bit.

void unsetisclone(GEN z) unsets the clone bit.

Remark. The clone bit is there so that gunclone can check it is deleting an object which was allocated by gclone. Miscellaneous vector entries are often cloned by gp so that a GP statement like v[1] = x does not involve copying the whole of v: the component v[1] is deleted if its clone bit is set, and is replaced by a clone of x. Don't set/unset yourself the clone bit unless you know what you are doing: in particular never set the clone bit of a vector component when the said vector is scheduled to be uncloned. Hackish code may abuse the clone bit to tag objects for reasons unrelated to the above instead of using proper data structures. Don't do that.

These macros are written in such a way that you do not need to worry about type casts when using them: i.e. if z is a GEN, typ(z[2]) is accepted by your compiler, as well as the more proper typ(gel(z,2)). Note that for the sake of efficiency, none of the codeword-handling macros check the types of their arguments even when there are stringent restrictions on their use.

Some types have a second codeword, used differently by each type, and we will describe it as we now consider each of them in turn.

Type t_INT (integer)

this type has a second codeword z[1] which contains the following information:

the sign of z: coded as 1, 0 or -1 if z > 0, z = 0, z < 0 respectively.

the effective length of z, i.e. the total number of significant longwords. This means the following: apart from the integer 0, every integer is ``normalized'', meaning that the most significant mantissa longword is non-zero. However, the integer may have been created with a longer length. Hence the ``length'' which is in z[0] can be larger than the ``effective length'' which is in z[1].

This information is handled using the following macros:

long signe(GEN z) returns the sign of z.

void setsigne(GEN z, long s) sets the sign of z to s.

long lgefint(GEN z) returns the effective length of z.

void setlgefint(GEN z, long l) sets the effective length of z to l.

The integer 0 can be recognized either by its sign being 0, or by its effective length being equal to 2. Now assume that z != 0, and let

   |z |= sum_{i = 0}^n z_i B^i, {where}S< >z_n != 0S< >{and}S< >B = 2^{BITS_IN_LONG}.

With these notations, n is lgefint(z) - 3, and the mantissa of z may be manipulated via the following interface:

GEN int_MSW(GEN z) returns a pointer to the most significant word of z, z_n.

GEN int_LSW(GEN z) returns a pointer to the least significant word of z, z_0.

GEN int_W(GEN z, long i) returns the i-th significant word of z, z_i. Accessing the i-th significant word for i > n yields unpredictable results.

GEN int_precW(GEN z) returns the previous (less significant) word of z, z_{i-1} assuming z points to z_i.

GEN int_nextW(GEN z) returns the next (more significant) word of z, z_{i+1} assuming z points to z_i.

Unnormalized integers, such that z_n is possibly 0, are explicitly forbidden. To enforce this, one may write an arbitrary mantissa then call

void int_normalize(GEN z, long known0)

normalizes in place a non-negative integer (such that z_n is possibly 0), assuming at least the first known0 words are zero.

For instance a binary and could be implemented in the following way:

  GEN AND(GEN x, GEN y) {
    long i, lx, ly, lout;
    long *xp, *yp, *outp; /* mantissa pointers */
    GEN out;
    if (!signe(x) || !signe(y)) return gen_0;
    lx = lgefint(x); xp = int_LSW(x);
    ly = lgefint(y); yp = int_LSW(y); lout = min(lx,ly); /* > 2 */
    out = cgeti(lout); out[1] = evalsigne(1) | evallgefint(lout);
    outp = int_LSW(out);
    for (i=2; i < lout; i++)
    {
      *outp = (*xp) & (*yp);
      outp  = int_nextW(outp);
      xp    = int_nextW(xp);
      yp    = int_nextW(yp);
    }
    if ( !*int_MSW(out) ) out = int_normalize(out, 1);
    return out;
  }

This low-level interface is mandatory in order to write portable code since PARI can be compiled using various multiprecision kernels, for instance the native one or GNU MP, with incompatible internal structures (for one thing, the mantissa is oriented in different directions).

The following further macros are available:

long mpodd(GEN x) which is 1 if x is odd, and 0 otherwise.

long mod2(GEN x), mod4(x), and so on up to mod64(x), which give the residue class of x modulo the corresponding power of 2, for positive x. By definition, modn(x) := modn(|x|) for x < 0 (the macros disregard the sign), and the result is undefined if x = 0.

These macros directly access the binary data and are thus much faster than the generic modulo functions. Besides, they return long integers instead of GENs, so they do not clutter up the stack.

Type t_REAL (real number)

this type has a second codeword z[1] which also encodes its sign, obtained or set using the same functions as for a t_INT, and a binary exponent. This exponent is handled using the following macros:

long expo(GEN z) returns the exponent of z. This is defined even when z is equal to zero, see Label se:whatzero.

void setexpo(GEN z, long e) sets the exponent of z to e.

Note the functions:

long gexpo(GEN z) which tries to return an exponent for z, even if z is not a real number.

long gsigne(GEN z) which returns a sign for z, even when z is neither real nor integer (a rational number for instance).

The real zero is characterized by having its sign equal to 0. If z is not equal to 0, then is is represented as 2^e M, where e is the exponent, and M belongs to [1, 2[ is the mantissa of z, whose digits are stored in z[2],..., z[lg(z)-1].

More precisely, let m be the integer (z[2],..., z[lg(z)-1]) in base 2^BITS_IN_LONG; here, z[2] is the most significant longword and is normalized, i.e. its most significant bit is 1. Then we have M := m.2^{1 - bit_accuracy(lg(z))}.

Thus, the real number 3.5 to accuracy bit_accuracy(lg(z)) is represented as z[0] (encoding type = t_REAL, lg(z)), z[1] (encoding sign = 1, expo = 1), z[2] = 0xe0000000, z[3] = .. .= z[lg(z)-1] = 0x0.

Type t_INTMOD

z[1] points to the modulus, and z[2] at the number representing the class z. Both are separate GEN objects, and both must be t_INTs, satisfying the inequality 0 <= z[2] < z[1].

It is good practice to keep the modulus object on the heap, so that new t_INTMODs resulting from operations can point at this common object, instead of carrying along their own copies of it on the stack. The library functions implement this practice almost by default.

Type t_FRAC (rational number)

z[1] points to the numerator n, and z[2] to the denominator d. Both must be of type t_INT such that d != 0, n > 0 and (n,d) = 1 (see gred_frac2).

Type t_COMPLEX (complex number)

z[1] points to the real part, and z[2] to the imaginary part. A priori z[1] and z[2] can be of any type, but only certain types are useful and make sense (mostly t_INT, t_REAL and t_FRAC).

Type t_PADIC (p-adic numbers)

this type has a second codeword z[1] which contains the following information: the p-adic precision (the exponent of p modulo which the p-adic unit corresponding to z is defined if z is not 0), i.e. one less than the number of significant p-adic digits, and the exponent of z. This information can be handled using the following functions:

long precp(GEN z) returns the p-adic precision of z.

void setprecp(GEN z, long l) sets the p-adic precision of z to l.

long valp(GEN z) returns the p-adic valuation of z (i.e. the exponent). This is defined even if z is equal to 0, see Label se:whatzero.

void setvalp(GEN z, long e) sets the p-adic valuation of z to e.

In addition to this codeword, z[2] points to the prime p, z[3] points to p^{{precp(z)}}, and z[4] points to at_INT representing the p-adic unit associated to z modulo z[3] (and to zero if z is zero). To summarize, if z != 0, we have the equality:

   z = p^{{valp(z)}} * (z[4] + O(z[3])), {where} z[3] = O(p^{{precp(z)}}).

Type t_QUAD (quadratic number)

z[1] points to the canonical polynomial P defining the quadratic field (as output by quadpoly), z[2] to the ``real part'' and z[3] to the ``imaginary part''. The latter are of type t_INT, t_FRAC, t_INTMOD, or t_PADIC and are to be taken as the coefficients of z with respect to the canonical basis (1,X) or Q[X]/(P(X)), see Label se:compquad. Exact complex numbers may be implemented as quadratics, but t_COMPLEX is in general more versatile (t_REAL components are allowed) and more efficient.

Operations involving a t_QUAD and t_COMPLEX are implemented by converting the t_QUAD to a t_REAL (or t_COMPLEX with t_REAL components) to the accuracy of the t_COMPLEX. As a consequence, operations between t_QUAD and exact t_COMPLEXs are not allowed.

Type t_POLMOD (polmod)

as for t_INTMODs, z[1] points to the modulus, and z[2] to a polynomial representing the class of z. Both must be of type t_POL in the same variable, satisfying the inequality deg z[2] E<lt> deg z[1]. However, z[2] is allowed to be a simplification of such a polynomial, e.g a scalar. This is tricky considering the hierarchical structure of the variables; in particular, a polynomial in variable of lesser priority (see Label se:priority) than the modulus variable is valid, since it is considered as the constant term of a polynomial of degree 0 in the correct variable. On the other hand a variable of greater priority is not acceptable; see Label se:priority for the problems which may arise.

Type t_POL (polynomial)

this type has a second codeword. It contains a ``sign'': 0 if the polynomial is equal to 0, and 1 if not (see however the important remark below) and a variable number (e.g. 0 for x, 1 for y, etc...).

These data can be handled with the following macros: signe and setsigne as for t_INT and t_REAL,

long varn(GEN z) returns the variable number of the object z,

void setvarn(GEN z, long v) sets the variable number of z to v.

The variable numbers encode the relative priorities of variables as discussed in Label se:priority. We will give more details in Label se:vars. Note also the function long gvar(GEN z) which tries to return a variable number for z, even if z is not a polynomial or power series. The variable number of a scalar type is set by definition equal to BIGINT, which has lower priority than any other variable number.

The components z[2], z[3],...z[lg(z)-1] point to the coefficients of the polynomial in ascending order, with z[2] being the constant term and so on.

For an object of type t_POL, leading_term, constant_term, degpol return a pointer to the leading term (with respect to the main variable of course), constant term, and degree of the polynomial (with the convention deg (0) = -1). Applied to any other type, the result is unspecified. Note that no copy is made on the pari stack so the returned value is not safe for a basic gerepile call. Note that degpol(z) = lg(z) - 3.

The leading term is not allowed to be an exact 0 (unnormalized polynomial). To prevent this, one may use

GEN normalizepol(GEN x) applied to an unnormalized t_POL x (with all coefficients correctly set except that leading_term(x) might be zero), normalizes x correctly in place and returns x. For internal use.

long degree(GEN x) returns the degree of x with respect to its main variable even when x is not a polynomial (a rational function for instance). By convention, the degree of 0 is -1.

Important remark. A zero polynomial can be characterized by the fact that its sign is 0. However, its length may be greater than 2, meaning that all the coefficients of the polynomial are equal to zero, but the leading term z[lg(z)-1] is an inexact zero. More precisely, gcmp0(x) is true for all coefficients x of the polynomial, an isexactzero(x) is false for the leading coefficient. The same remark applies to t_SERs.

Type t_SER (power series)

This type also has a second codeword, which encodes a ``sign'', i.e. 0 if the power series is 0, and 1 if not, a variable number as for polynomials, and an exponent. This information can be handled with the following functions: signe, setsigne, varn, setvarn as for polynomials, and valp, setvalp for the exponent as for p-adic numbers. Beware: do not use expo and setexpo on power series.

The coefficients z[2], z[3],...z[lg(z)-1] point to the coefficients of z in ascending order. As for polynomials (see remark there), the sign of a t_SER is 0 if and only if the leading coefficient of the series is an inexact 0. (It cannot be an exact 0.)

Note that the exponent of a power series can be negative, i.e. we are then dealing with a Laurent series (with a finite number of negative terms).

Type t_RFRAC (rational function)

z[1] points to the numerator n, and z[2] on the denominator d. The denominator must be of type t_POL, with variable of higher priority than the numerator. The numerator n is not an exact 0 and (n,d) = 1 (see gred_rfac2).

Type t_QFR (indefinite binary quadratic form)

z[1], z[2], z[3] point to the three coefficients of the form and are of type t_INT. z[4] is Shanks's distance function, and must be of type t_REAL.

Type t_QFI (definite binary quadratic form)

z[1], z[2], z[3] point to the three coefficients of the form. All three are of type t_INT.

Type t_VEC and t_COL (vector)

z[1], z[2],...z[lg(z)-1] point to the components of the vector.

Type t_MAT (matrix)

z[1], z[2],...z[lg(z)-1] point to the column vectors of z, i.e. they must be of type t_COL and of the same length.

Type t_VECSMALL (vector of small integers)

z[1], z[2],...z[lg(z)-1] are ordinary signed long integers. This type is used instead of a t_VEC of t_INTs for efficiency reasons, for instance to implement efficiently permutations, polynomial arithmetic and linear algebra over small finite fields, etc.

The next two types were introduced for specific gp use, and you are better off using the standard malloc'ed C constructs when programming in library mode. We quote them for completeness, advising you not to use them:

Type t_LIST (list)

This one has a second codeword which contains an effective length (handled through lgeflist / setlgeflist). z[2],..., z[lgeflist(z)-1] contain the components of the list.

Type t_STR (character string)

char * GSTR(z) ( = (z+1)) points to the first character of the (NULL-terminated) string.

Implementation note: for the types including an exponent (or a valuation), we actually store a biased non-negative exponent (bit-ORing the biased exponent to the codeword), obtained by adding a constant to the true exponent: either HIGHEXPOBIT (for t_REAL) or HIGHVALPBIT (for t_PADIC and t_SER). Of course, this is encapsulated by the exponent/valuation-handling macros and needs not concern the library user.


PARI variables

=head2 Multivariate objectsvariable

(priority)>

We now consider variables and formal computations, and give the technical details corresponding to the general discussion in Label se:priority. As we have seen in Label se:impl, the codewords for types t_POL and t_SER encode a ``variable number''. This is an integer, ranging from 0 to MAXVARN. Relative priorities may be ascertained using

int varncmp(long v, long w)

which is > 0, = 0, < 0 whenever v has lower, resp. same, resp. higher priority than w.

The way an object is considered in formal computations depends entirely on its ``principal variable number'' which is given by the function

long gvar(GEN z)

which returns a variable number for z, even if z is not a polynomial or power series. The variable number of a scalar type is set by definition equal to BIGINT which has lower priority than any valid variable number. The variable number of a recursive type which is not a polynomial or power series is the variable number with highest priority among its components. But for polynomials and power series only the ``outermost'' number counts (we directly access varn(x) in the codewords): the representation is not symmetrical at all.

Under gp, one needs not worry too much since the interpreter defines the variables as it sees themFOOTNOTE<<< The first time a given identifier is read by the GP parser (and is not immediately interpreted as a function) a new variable is created, and it is assigned a strictly lower priority than any variable in use at this point. On startup, before any user input has taken place, 'x' is defined in this way and has initially maximal priority (and variable number 0). >>>

and do the right thing with the polynomials produced (however, have a look at the remark in Label se:rempolmod).

But in library mode, they are tricky objects if you intend to build polynomials yourself (and not just let PARI functions produce them, which is less efficient). For instance, it does not make sense to have a variable number occur in the components of a polynomial whose main variable has a lower priority, even though PARI cannot prevent you from doing it; see Label se:priority for a discussion of possible problems in a similar situation.

Creating variables

A basic difficulty is to ``create'' a variable. As we have seen in Label se:intro4, a number of objects is associated to variable number v. Here is the complete list: pol_1[v] and pol_x[v], which you can use in library mode and which represent, respectively, the monic monomials of degrees 0 and 1 in v; varentries[v], and polvar[v]. The latter two are only meaningful to gp, but they have to be set nevertheless. All of them must be properly defined before you can use a given integer as a variable number.

Initially, this is done for 0 (the variable x under gp), and MAXVARN, which is there to address the need for a ``temporary'' new variable in library mode and cannot be input under gp. No documented library function can create from scratch an object involving MAXVARN (of course, if the operands originally involve MAXVARN, the function abides). We call the latter type a ``temporary variable''. The regular variables meant to be used in regular objects, are called ``user variables''.

User variables
When the program starts, x is the only user variable (number 0). To define new ones, use

long fetch_user_var(char *s)

which inspects the user variable named s (creating it if needed), and returns its variable number.

  long v = fetch_user_var("y");
  GEN gy = pol_x[v];

This function raises an error if s is already registered as a function name.

Caveat: you can use gp_read_str (see Label se:gp_read_str) to execute a GP command and create GP variables on the fly as needed:

  GEN gy = gp_read_str("'y"); /* returns pol_x[v], for some v */
  long v = varn(gy);

But please note the quote 'y in the above. Using gp_read_str("y") might work, but is dangerous, especially when programming functions to be used under gp. The latter reads the value of y, as currently known by the gp interpreter, possibly creating it in the process. But if y has been modified by previous gp commands (e.g y = 1), then the value of gy is not what you expected it to be and corresponds instead to the current value of the gp variable (e.g gen_1).

Technical remark If you are rewriting the gp interpreter, you may use the lower level

entree * fetch_named_var(char *s)

which returns an entree* suitable for inclusion in the interpreter hashlists of symbols.

Temporary variables
MAXVARN is available, but is better left to pari internal functions (some of which do not check that MAXVARN is free for them to use, which can be considered a bug). You can create more temporary variables using

long fetch_var()

This returns a variable number which is guaranteed to be unused by the library at the time you get it and as long as you do not delete it (we will see how to do that shortly). This has higher priority than any temporary variable produced so far (MAXVARN is assumed to be the first such). This call updates all the aforementioned internal arrays. In particular, after the statement v = fetch_var(), you can use pol_1[v] and pol_x[v]. The variables created in this way have no identifier assigned to them though, and they is printed as # < {number} > , except for MAXVARN which is printed as #. You can assign a name to a temporary variable, after creating it, by calling the function

void name_var(long n, char *s)

after which the output machinery will use the name s to represent the variable number n. The GP parser will not recognize it by that name, however, and calling this on a variable known to gp raises an error. Temporary variables are meant to be used as free variables, and you should never assign values or functions to them as you would do with variables under gp. For that, you need a user variable.

All objects created by fetch_var are on the heap and not on the stack, thus they are not subject to standard garbage collecting (they are not destroyed by a gerepile or avma = ltop statement). When you do not need a variable number anymore, you can delete it using

long delete_var()

which deletes the latest temporary variable created and returns the variable number of the previous one (or simply returns 0 if you try, in vain, to delete MAXVARN). Of course you should make sure that the deleted variable does not appear anywhere in the objects you use later on. Here is an example:

    long first = fetch_var();
    long n1 = fetch_var();
    long n2 = fetch_var(); /* prepare three variables for internal use */
    ...
    /* delete all variables before leaving */
    do { num = delete_var(); } while (num && num <= first);

The (dangerous) statement

    while (delete_var()) /* empty */;

removes all temporary variables in use, except MAXVARN which cannot be deleted.


Input and output

Two important aspects have not yet been explained which are specific to library mode: input and output of PARI objects.

Input.

For input, PARI provides you with one powerful high level function which enables you to input your objects as if you were under gp. In fact, it is essentially the GP syntactical parser, hence you can use it not only for input but for (most) computations that you can do under gp. It has the following syntax:

GEN gp_read_str(char *s)

In fact this function starts by filtering out all spaces and comments in the input string. They it calls the underlying basic function, the GP parser proper: GEN gp_read_str(char *s), which is slightly faster but which you probably do not need.

To read a GEN from a file, you can use the simpler interface

GEN gp_read_stream(FILE *file)

which reads a character string of arbitrary length from the stream file (up to the first complete expression sequence), applies gp_read_str to it, and returns the resulting GEN. This way, you do not have to worry about allocating buffers to hold the string. To interactively input an expression, use gp_read_stream(stdin).

Finally, you can read in a whole file, as in GP's read statement

GEN gp_read_file(char *name)

As usual, the return value is that of the last non-empty expression evaluated. Note that gp's metacommands are not recognized.

Once in a while, it may be necessary to evaluate a GP expression sequence involving a call to a function you have defined in C. This is easy using install which allows you to manipulate quite an arbitrary function (GP knows about pointers!). The syntax is

void install(void *f, char *name, char *code)

where f is the (address of) the function (cast to the C type void*), name is the name by which you want to access your function from within your GP expressions, and code is a character string describing the function call prototype (see Label se:gp.interface for the precise description of prototype strings). In case the function returns a GEN, it must satisfy gerepileupto assumptions (see Label se:garbage).

Output.

For output, there exist essentially three different functions (with variants), corresponding to the three main gp output formats (as described in Label se:output), plus three extra ones, respectively devoted to TeX output, string output, and debugging.

* ``raw'' format, obtained by using the function brute with the following syntax:

void brute(GEN obj, char x, long n)

This prints the PARI object obj in format x0.n, using the notations from Label se:format. Recall that here x is either 'e', 'f' or 'g' corresponding to the three numerical output formats, and n is the number of printed significant digits, and should be set to -1 if all of them are wanted (these arguments only affect the printing of real numbers). Usually one does not need that much flexibility, and gets by with the function

void outbrute(GEN obj), which is equivalent to brute(x,'g',-1),

or even better, with

void output(GEN obj) which is equivalent to outbrute(obj) followed by a newline and a buffer flush. This is especially nice during debugging. For instance using dbx or gdb, if obj is a GEN, typing print output(obj) enables you to see the content of obj (provided the optimizer has not put it into a register, but it is rarely a good idea to debug optimized code).

* ``prettymatrix'' format: this format is identical to the preceding one except for matrices. The relevant functions are:

void matbrute(GEN obj, char x, long n)

void outmat(GEN obj), which is followed by a newline and a buffer flush.

* ``prettyprint'' format: the basic function has an additional parameter m, corresponding to the (minimum) field width used for printing integers:

void sor(GEN obj, char x, long n, long m)

The simplified version is

void outbeaut(GEN obj) which is equivalent to sor(obj,'g',-1,0) followed by a newline and a buffer flush.

* The first extra format corresponds to the texprint GP function, and gives a TeX output of the result. It is obtained by using:

void texe(GEN obj, char x, long n)

* The second one is the function GENtostr which converts a PARI GEN to an ASCII string. The syntax is

char* GENtostr(GEN obj), wich returns a malloc'ed character string (which you should free after use).

* The third and final one outputs the hexadecimal tree corresponding to the gp metacommand \b x using the function

void voir(GEN obj, long nb), which only outputs the first nb words corresponding to leaves (very handy when you have a look at big recursive structures). If you set this parameter to -1 all significant words are printed. This last type of output is only used for debugging purposes.

Remark. Apart from GENtostr, all PARI output is done on the stream outfile, which by default is initialized to stdout. If you want that your output be directed to another file, you should use the function void switchout(char *name) where name is a character string giving the name of the file you are going to use. The output is appended at the end of the file. In order to close the file, simply call switchout(NULL).

Similarly, errors are sent to the stream errfile (stderr by default), and input is done on the stream infile, which you can change using the function switchin which is analogous to switchout.

(Advanced) Remark. All output is done according to the values of the pariOut / pariErr global variables which are pointers to structs of pointer to functions. If you really intend to use these, this probably means you are rewriting gp. In that case, have a look at the code in language/es.c (init80() or GENtostr() for instance).

Errors

If you want your functions to issue error messages, you can use the general error handling routine pari_err. The basic syntax is

    pari_err(talker, "error message");

This prints the corresponding error message and exit the program (in library mode; go back to the gp prompt otherwise). You can also use it in the more versatile guise

    pari_err(talker, format, ...);

where format describes the format to use to write the remaining operands, as in the printf function (however, see the next section). The simple syntax above is just a special case with a constant format and no remaining arguments.

The general syntax is

void pari_err(numerr,...)

where numerr is a codeword which indicates what to do with the remaining arguments and what message to print. The list of valid keywords is in language/errmessages.c together with the basic corresponding message. For instance, pari_err(typeer,"extgcd") prints the message:

      ***   incorrect type in extgcd.

To issue a warning, use

void pari_warn(warnerr,...) In that case, of course, we do not abort the computation, just print the requested message and go on. The basic example is

      pari_warn(warner, "Strategy 1 failed. Trying strategy 2")

which is the exact equivalent of pari_err(talker,...) except that you certainly do not want to stop the program at this point, just inform the user that something important has occurred (in particular, this output would be suitably highlighted under gp, whereas a simple printf would not).

The valid warning keywords are warner (general), warnprec (increasing precision), warnmem (garbage collecting) and warnfile (error in file operation), used as follows:

      pari_warn(warnprec, "bnfinit", newprec);
      pari_warn(warnmem,  "bnfinit");
      pari_warn(warnfile, "close", "log");  /* error when closing "log" */

Debugging output

The global variables DEBUGLEVEL and DEBUGMEM (corresponding to the default debug and debugmem, see Label se:defaults) are used throughout the PARI code to govern the amount of diagnostic and debugging output, depending on their values. You can use them to debug your own functions, especially after having made them accessible under gp through the command install (see Label se:install).

For debugging output, you can use printf and the standard output functions (brute or output mainly), but also some special purpose functions which embody both concepts, the main one being

void fprintferr(char *pariformat, ...)

Now let us define what a PARI format is. It is a character string, similar to the one printf uses, where % characters have a special meaning. It describes the format to use when printing the remaining operands. But, in addition to the standard format types, you can use %Z to denote a GEN object (we would have liked to pick %G but it was already in use!). For instance you could write:

  pari_err(talker, "x[%d] = %Z is not invertible!", i, x[i])

since the pari_err function accepts PARI formats. Here i is an int, x a GEN which is not a leaf and this would insert in raw format the value of the GEN x[i].

Timers and timing output.

To profile your functions, you can use the PARI timer. The functions long timer() and long timer2() return the elapsed time since the last call of the same function (in milliseconds). Two different functions (identical except for their independent time-of-last-call memories!) are provided so you can have both global timing and fine tuned profiling.

You can also use void msgtimer(char *format,...), which prints prints Time, then the remaining arguments as specified by format (which is a PARI format), then the output of timer2.

This mechanism is simple to use but not foolproof. If some other function uses these timers, and many PARI functions do use timer2 when DEBUGLEVEL is high enough, the timings will be meaningless. To handle timing in a reentrant way, PARI defines a dedicated data type, pari_timer. The functions

void TIMERstart(pari_timer *T)

long TIMER(pari_timer *T)

long msgTIMER(pari_timer *T, char *format,...)

provide an equivalent to timer and msgtimer, except they use a unique timer T containing all the information needed, so that no other function can mess with your timings. They are used as follows:

    pari_timer T;
    TIMERstart(&T); /* initialize timer */
    ...
    printf("Total time: %ld\n", TIMER(&T));

or

    pari_timer T;
    TIMERstart(&T);
    for (i = 1; i < 10; i++) {
      ...
      msgTIMER(&T, "for i = %ld (L[i] = %Z)", i, L[i]);
    }


A complete program

Now that the preliminaries are out of the way, the best way to learn how to use the library mode is to study a detailed example. We want to write a program which computes the gcd of two integers, together with the Bezout coefficients. We shall use the standard quadratic algorithm which is not optimal but is not too far from the one used in the PARI function bezout.

Let x,y two integers and initially \pmatrix{s_x s_y t_x t_y } = \pmatrix{1 0 0 1}, so that

   \pmatrix{s_x s_y t_x t_y } \pmatrix{x y } = \pmatrix{x y }.

To apply the ordinary Euclidean algorithm to the right hand side, multiply the system from the left by \pmatrix{0 1 1 -q }, with q = floor(x / y). Iterate until y = 0 in the right hand side, then the first line of the system reads

   s_x x + s_y y = gcd(x,y).

In practice, there is no need to update s_y and t_y since gcd(x,y) and s_x are enough to recover s_y. The following program is now straightforward. A couple of new functions appear in there, whose description can be found in the technical reference manual in Chapter 5.

file{../examples/extgcd.c}

Note that, for simplicity, the inner loop does not include any garbage collection, hence memory use is quadratic in the size of the inputs instead of linear.

\section{Adding functions to PARI}

Nota Bene

.

As mentioned in the COPYING file, modified versions of the PARI package can be distributed under the conditions of the GNU General Public License. If you do modify PARI, however, it is certainly for a good reason, hence we would like to know about it, so that everyone can benefit from it. There is then a good chance that your improvements are incorporated into the next release.

We classify changes to PARI into four rough classes, where changes of the first three types are almost certain to be accepted. The first type includes all improvements to the documentation, in a broad sense. This includes correcting typos or inacurracies of course, but also items which are not really covered in this document, e.g. if you happen to write a tutorial, or pieces of code exemplifying fine points unduly omitted in the present manual.

The second type is to expand or modify the configuration routines and skeleton files (the Configure script and anything in the config/ subdirectory) so that compilation is possible (or easier, or more efficient) on an operating system previously not catered for. This includes discovering and removing idiosyncrasies in the code that would hinder its portability.

The third type is to modify existing (mathematical) code, either to correct bugs, to add new functionalities to existing functions, or to improve their efficiency.

Finally the last type is to add new functions to PARI. We explain here how to do this, so that in particular the new function can be called from gp.

The calling interface from gp, parser codes

. A parser code is a character string describing all the GP parser needs to know about the function prototype. It contains a sequence of the following atoms:

* Syntax requirements, used by functions like for, sum, etc.:

  = separator = required at this point (between two arguments)

* Mandatory arguments, appearing in the same order as the input arguments they describe:

  G GEN   & *GEN   L long (we implicitly identify int with long)   S symbol (i.e. GP identifier name). Function expects a *entree   V variable (as S, but rejects symbols associated to functions)   n variable, expects a variable number (a long, not an *entree)   I string containing a sequence of GP statements (a seq), to be processed by gp_read_str   (useful for control statements)   E string containing a single GP statement (an expr), to be processed by readexpr   r raw input (treated as a string without quotes). Quoted args are copied as strings   Stops at first unquoted ')' or ','. Special chars can be quoted using '\'   Example: aa"b\n)"c yields the string "aab\n)c"   s expanded string. Example: Pi"x"2 yields "3.142x2"   Unquoted components can be of any PARI type (converted following current output   format)

* Optional arguments:

  s* any number of strings, possibly 0 (see s)   Dxxx argument has a default value

The s* code is technical and you probably do not need it, but we give its description for completeness. It reads all remaining arguments in string context (see Label se:strings), and sends a (NULL-terminated) list of GEN* pointing to these. The automatic concatenation rules in string context are implemented so that adjacent strings are read as different arguments, as if they had been comma-separated. For instance, if the remaining argument sequence is: "xx" 1, "yy", the s* atom sends a GEN *g = { &a, &b, &c, NULL}, where a, b, c are GENs of type t_STR (content "xx"), t_INT (equal to 1) and t_STR (content "yy").

The format to indicate a default value (atom starts with a D) is ``Dvalue,type,'', where type is the code for any mandatory atom (previous group), value is any valid GP expression which is converted according to type, and the ending comma is mandatory. For instance D0,L, stands for ``this optional argument is converted to a long, and is 0 by default''. So if the user-given argument reads 1 + 3 at this point, 4L is sent to the function; and 0L if the argument is omitted. The following special syntaxes are available:

\settabs\indent\indent Dxxx optional *GEN,   DG optional GEN, send NULL if argument omitted.

  D& optional *GEN, send NULL if argument omitted.

  DV optional *entree, send NULL if argument omitted.

  DI optional *char, send NULL if argument omitted.

  Dn optional variable number, -1 if omitted.

* Automatic arguments:

  f Fake *long. C function requires a pointer but we do not use the resulting long   p real precision (default realprecision)   P series precision (default seriesprecision, global variable precdl for the library)

* Return type: GEN by default, otherwise the following can appear at the start of the code string:

  i return int   l return long   v return void

No more than 8 arguments can be given (syntax requirements and return types are not considered as arguments). This is currently hardcoded but can trivially be changed by modifying the definition of argvec in anal.c:identifier(). This limitation should disappear in future versions.

When the function is called under gp, the prototype is scanned and each time an atom corresponding to a mandatory argument is met, a user-given argument is read (gp outputs an error message it the argument was missing). Each time an optional atom is met, a default value is inserted if the user omits the argument. The ``automatic'' atoms fill in the argument list transparently, supplying the current value of the corresponding variable (or a dummy pointer).

For instance, here is how you would code the following prototypes, which do not involve default values:

  GEN name(GEN x, GEN y, long prec)   ----> "GGp"
  void name(GEN x, GEN y, long prec)  ----> "vGGp"
  void name(GEN x, long y, long prec) ----> "vGLp"
  long name(GEN x)                    ----> "lG"
  int name(long x)                    ----> "iL"

If you want more examples, gp gives you easy access to the parser codes associated to all GP functions: just type \h function. You can then compare with the C prototypes as they stand in the paridecl.h.

Remark: If you need to implement complicated control statements (probably for some improved summation functions), you need to know about the entree type, which is not documented. Check the comment at the end of language/init.c and the source code in language/sumiter.c.

Coding guidelines.

Code your function in a file of its own, using as a guide other functions in the PARI sources. One important thing to remember is to clean the stack before exiting your main function, since otherwise successive calls to the function clutters the stack with unnecessary garbage, and stack overflow occurs sooner. Also, if it returns a GEN and you want it to be accessible to gp, you have to make sure this GEN is suitable for gerepileupto (see Label se:garbage).

If error messages or warnings are to be generated in your function, use pari_err and pari_warn respectively. Recall that pari_err does not return but ends with a longjmp statement. As well, instead of explicit printf / fprintf statements, use the following encapsulated variants:

void pariflush(): flush output stream.

void pariputc(char c): write character c to the output stream.

void pariputs(char *s): write s to the output stream.

void fprintferr(char *s): write s to the error stream (this function is in fact much more versatile, see Label se:dbg_output).

Declare all public functions in an appropriate header file, if you want to access them from C. For example, if dynamic loading is not available, you may need to modify PARI to access these functions, so put them in paridecl.h. The other functions should be declared static in your file.

Your function is now ready to be used in library mode after compilation and creation of the library. If possible, compile it as a shared library (see the Makefile coming with the extgcd example in the distribution). It is however still inaccessible from gp.

Integration with gp as a shared module

To tell gp about your function, you must do the following. First, find a name for it. It does not have to match the one used in library mode, but consistency is nice. It has to be a valid GP identifier, i.e. use only alphabetic characters, digits and the underscore character (_), the first character being alphabetic.

Then figure out the correct parser code corresponding to the function prototype, as explained above (Label se:gp.interface).

Now, assuming your Operating System is supported by install, write a GP script like the following:

  install(libname, code, gpname, library)
  addhelp(gpname, "some help text")

(see Label se:addhelp and se:install). The addhelp part is not mandatory, but very useful if you want others to use your module. libname is how the function is named in the library, usually the same name as one visible from C.

Read that file from your gp session (from your preferences file for instance, see Label se:gprc), and that's it. You can now use the new function gpname under gp, and we would very much like to hear about it!

Integration the hard way

If install is not available, things are more complicated: you have to hardcode your function in the gp binary (or install Linux). Here is what needs to be done:

You need to choose a section and add a file functions/section/gpname containing the following, keeping the notation above:

  Function:  I<gpname>
  Section:   I<section>
  C-Name:    I<libname>
  Prototype: I<code>
  Help:      I<some help text>

(If the help text does not fit on a single line, continuation lines must start by a whitespace character.) A GP2C-related Description field is also available to improve the code GP2C generates when compiling scripts involving your function. See the GP2C documentation for details.

At this point you can recompile gp, which will first rebuild the functions database.

Example.

A complete description could look like this:

  {
    install(bnfinit0, "GD0,L,DGp", ClassGroupInit, "libpari.so");
    addhelp(ClassGroupInit, "ClassGroupInit(P,{flag=0},{data=[]}):
      compute the necessary data for ...");
  }

which means we have a function ClassGroupInit under gp, which calls the library function bnfinit0 . The function has one mandatory argument, and possibly two more (two 'D' in the code), plus the current real precision. More precisely, the first argument is a GEN, the second one is converted to a long using itos (0 is passed if it is omitted), and the third one is also a GEN, but we pass NULL if no argument was supplied by the user. This matches the C prototype (from paridecl.h):

    GEN bnfinit0(GEN P, long flag, GEN data, long prec)

This function is in fact coded in basemath/buch2.c, and is in this case completely identical to the GP function bnfinit but gp does not need to know about this, only that it can be found somewhere in the shared library libpari.so.

Important note: You see in this example that it is the function's responsibility to correctly interpret its operands: data = NULL is interpreted by the function as an empty vector. Note that since NULL is never a valid GEN pointer, this trick always enables you to distinguish between a default value and actual input: the user could explicitly supply an empty vector!

Note: If install is not available, we have to add a file

functions/number_fields/ClassGroupInit

containing the following:

  Function: ClassGroupInit
  Section: number_fields
  C-Name: bnfinit0
  Prototype: GD0,L,DGp
  Help: ClassGroupInit(P,{flag=0},{tech=[]}): this routine does ...