[Home]

J

J is an array programming language. In this sense J is like python or ruby or r or matlab or many other modern languages. But J adds an extra twist to array programming. This tutorial is about this extra twist.

But first let's see why array programming languages are easier to use than their more traditional counterparts like C or Java. Use of loops is the most important reason behind the usability of a language. Most loops that a programmer needs requires iterating over the items in a list. Array programming languages capitalise on this observation, and provide a handy abbreviated notation to handle this special case. For example, if x is list of values and you want to compute sine for each of those values, then you can just write sin(x). This is very handy. But such "implicit" loops are dificult to nest. Let's see one example. Suppose that you have a sequence of functions $f_n(x).$ To the computer $f$ is just a function of two variables $n$ and $x.$ If you want to plot $f_1, f_2,...,f_9$ for $x\in[0,1],$ say, then you'll need to take a grid of $x$ values, and run two loops, one over n, the other over x. Even though f(1,x) will indeed run the x-loop as expected, but f(n,x) will not return you the desired matrix. So you have to either use explicit loops, or package the f(n,x) in a wrapper function, where the x values are already hardcoded. Both these solutions, it boots to say, are ugly. Thus, "implicit" loops seem to be a benefit available only for trivial one-liners. J has a brillian solution for this problem, which allows "implicit" loops to work up to any arbitrary level of nesting, without making the code too ugly. The key concept behind this is that of rank of a function. We shall illustrate with our f_n(x)=\frac{n+x}{n-x}. First write the function as a dyad with n as the left argument and x as the right argument. With a little J magic this is just.

 
    f=.+%-

If you want you may use a more traditional explicit form

    f=.4 : '(x-y)%(x+y)'

Whichever form you prefer, test it out with

    4 f 3

This works fine, and produces f_4(3), as expected. Nothing special here. Even C or Python pr R would behave similarly. Next take a list of values for x:

    x=. (i.10)%11

    If you try

    4 f x

then you'll get a list of values f_4(x_i). This is more than what C or Java can pr0duce, but quite achievwble with Python or R. Next take a list of values for n:

    n=i.10

The natural impulse now is to type

    n f x

Similar code for Python or Matlab will generate error. R will produce either garbage or error. Indeed, J also produces error. Let us understand why. If f(x,y) is a function that expects x,y to be numbers, then what should be a natual interpretation for f(x,y) where x,y are both lists? Unfortunately, there is no unanimous answer to this. If f(x,y)=x+y, then f([1,2,3],[4,5,6]) has the natural interpretation of being compone t ny component addition. But f(2,[1,2,3]) has a differe t natural interpretation: add 1 to all the entries. In our original example, the natural interpretation was to get a Cartesian Product. In view of different natural interpretations, it is quite understandable why softwares get confused at this point. Clearly, we need a way to specify what exactly we want, and it would be good if we can do so without leaving the luxury of implicit loops. J's rank mechanism is just such a thing. In our sequence of functions example, what we wanted was two nested loops: for each value of n we want to process all the x values. We do this in J as

    n f"0 1 x

The extra thing here is "0 1. The " means we are about to say how f should deal with lists. The 0 says that the left argument should be considwred as a collection of numbers (0 dimensional objects). The 1 says that the right argument should be considered as a collection of 1-dim lists. So this now reduces to the familiar situation, 2 items on one side and 1 on the other. Let's understand this more carefully. J works with lists, and lists of lists, and lists of lists of lists, and so on. But unlike languages like lisp, J ensures a rectangular structure: all lists at the same depth must have the same length. Thus, if we have a list of lists, then we can refer to each element as in a matrix, a_{ij}. The convention is to use the leftmost index for the outermost list. For a single number we don't need any index, of course. The number of indices needed is what I called dimension. Thus, a scalar is 0-dim, a vector is 1-dim, and a mattix 2-dim. While this conforms to our intuitive notion about dimension, it does have some less intuitive side effects. A vector of length n is not a 1xn or nx1 matrix. Similarly, a scalar is not a 1x1 matrix. This distinction is somewhat like the distinction between $\phi$ and $\{\phi\}$. As we have already noted, 1+[1 2 3 4] should naturally mean [2 3 4 5] and [1 2 3]+[4 2 5]=[5 4 8]. Instead of considerinf these two as twp isolated cases, J interpolates them with a common behaviour. It looks at the shapes of the two arguments. If one is a prefix of the other, then the shorter one is repeated. Else, J issues a length error. Now let's look again at our function sequence example. Once we specified "0 1, the left hand argument is considered aframe of shape 2, and the right hand argument a frame of shape empty. Since empty is a prefix of anything, the right hand argument is repeatedly used for each left hand argument. It will be instructive to make a plot of the functions:

    load 'plot'
    plot ;/     n f"0 1 x

This approach proves handy for any situation where we are processing some vector or matrix data, and our data analysis procedure involves some tuning parameter, that we want to vary. One example could be fitting a polynomial of a given degree to a data set. For each possible degree we wwnt to compure the correlation coefficient.

In our function sequence example we worked with f_n(x). This was a special case of a family of functions, where the indexing parameter was a single integer. A more general example is a multi-parameter family. For example, the Gaussian density involves 2 parameters. Suppose that we want to plot the Gaussian density curve for the following parameter pairs: [0 1], [1 1] [2 3]. Then we first create a list of these pairs:

    par=.3 2 $ 0 1 1 1 2 3

Also we need to have a function to compute Gaussian density:

    sn=. ^@-@*:

This is $e^{-x^2},$ the core of the standard normal density. We shall scale and shift it to get the other members of the family.

    n=.4 : '(sn (x-(0{x))%(1{x))%(1{x)'

    Now we are ready to plot.

    x=.(i:500)%50
    y=. par n"1 1 x
    plot x ; y

Of course, you can write the lines more compactly as follows.

    plot (]; par& n"1 1) (i:500)%50

While this is good, creating the list of parameter combinations is not fun. Say you want see the effect of the mean and standard deviation [arameters on the Gaussian density. For this it is natural to take a few values for mean and a few values for standard deviation, and plot the density for each combination. Thus here you are forming the parameter vues by taking Cartesian product of two lists. This is another situation where a traditional programming language would use nested loops. J has a mechanism for such Cartesian product. Let's take a simpler example to understand it.

    (i.8) +/ (i.3)

produces an $8\times3$ table of sums. We want a similar thing where $a+b$ is replaced by $(a,b)$. Since J already has the , verb for this purpose, it is natural to try

    (i.8) ,/ i.3

Unfortunately, this merely concatenates the two arguments. We want to concatenate each number in i.8 with each number in i.3. We already know how to tell J that the individual numbers are our items of interest:

    (i.8) ,"0 0/ i.3

This almost achieves our aim, but not quite. We wanted just a list of lists, but what we have got is a list of list of list. It is a $8\times3\times2$ list. We want a $24\times2$ list. But this is easily achieved by a little reshaping:

24 2 $,     (i.8) ,"0 0/ i.3

Why not turn this into a little dyad called c (for cartesian)? We already have the template needed. We just need it to make it work for any left and right arguments. Our first attempt looks like

c=.4 :'x ,"0 0/ y'

To do the reshaping we need to multiply the lengths of x and y. So here is our second attempt:

c=.4 :'(((#x) * #y),2)$, x ,"0 0/ y'

This is a handy thing to have. But its limitation will be apparent when you try

i.3 c i.4 c 2 9 8

The problem is that our verb cannot work with lists of lists. Of course, we can try to modify our function to handle this case:

c=.4 :'(((#x) * #y),2)$, x ,"1 1/ y'

But what to write in place of the 2? It should be the sum of lengths of items in x and y. Since all items in a list have the same length, it is enough to extract one and find its length: #{.x.

c=.4 : 0
nx=.#x
ny=.#y
mx=.#{.x
my=.#{.y
rx=.<:#$x
ry=.<:#$y
      ( (nx*ny),(mx+my) ) $, x ,"(rx,ry)/ y
)

Table of contents

J