Random variables and poetry... by Murray Lark

Murray Lark is back from his annual visit to The Isaac Newton Institute for Mathematical Sciences (INI) and invites you to explore with him some random variables and mathematical poetry...

The INI is a sort of large hadron collider for mathematical sciences, you put mathematicians into it and see what happens when they bump into each other.  The results of collisions are captured on blackboards.  These are on every available bit of wall-space in the INI building, there is even one in the gents!  When I first saw it I assumed that the architect was engaged in a bit of micky-taking.  Be that as it may, it had clearly been put to use.  INI programmes may address abstract branches of pure mathematics, but they also tackle applications where new maths may be needed.  Next year, for example, there is to be a four-month programme on the causes and consequences of melt in the Earth's mantle. 

Unassuming on the outside but inspirational on the inside,
it must be The Isaac Newton Institute for Mathematical Sciences
The INI is funded by five of the UK research councils as a centre of excellence in mathematics.  The institute maintains a network of correspondents at universities and institutes around the UK to keep it in touch with the wider scientific world in which maths is developed and applied.  I am the correspondent for BGS, which means that once a year I get to visit the INI in Cambridge  to hear about its work and to give feedback.

At this year's meeting for correspondents we had an opportunity to discuss the INI's role, and that of maths more generally, with representatives from four of the UK research councils including  NERC, BGS's parent body.  The discussions were informative and encouraging, the importance of maths is clearly appreciated, but did any of the day's activities have any bearing on what I do day to day as Environmental Statistican at BGS?  The answer is yes, but from an unexpected quarter. 

Professor John Barrow (author of "Pi in the sky" and numerous other popular books on mathematics) gave a lecture to show that "Mathematics is Everywhere".  Of the examples he gave to illustrate his point it was, perhaps surprisingly, the mathematical analysis of poetry which connected with my own interests from earth sciences.

A little over 100 years ago the Russian mathematician, A.A. Markov, presented a quantitative analysis of Alexander Pushkin's verse novel  Eugene Onegin.  More recently Brian Hayes repeated the analysis on an English translation, and took it further.  Consider any letter of the alphabet, "a" for example.  When one analyses the text of Pushkin's poem one finds a number of letters that follow "a";  "s", for example, in "was".  One may compute for each of the 26 letters the probability that it succeeds "a".  These are called the "transition probabilities".  The transition probability for "b" as the successor to "a" can be written as P(b|a).   If one computed all the transition probabilities for all the successors to all letters one could then generate "random  Pushkin" by using these first-order transition probabilities to generate random sequences of letters, first-order because we generate a letter by examining its solitary predecessor.  So, for example,  when generating  a letter we examine its predecessor, if that is "a" we then select a new letter at random, with the probability that it is "b" equal to P(b|a), the probability that it is "c" equal to P(c|a), and so on.  An example of first order random Pushkin, generated by Hayes, went "Theg sheso pa lyikly ut."  That may be a true realization of first-order Pushkin rules, but it does not look very convincing.


Hayes computed higher order transition probabilities.  For example, the third-order transition probabilities would give us the probability of each possible successor to groups of three predecessor letters.  The successors of  "fro", for example,  include "n" as in "front", and "m" as in "from".  The complete set of rules assigns transition probabilities to all groups of three letters that occur in the original text, values that we denote by P(n|fro), for example.  Once again, these can be used to generate random Pushkin.  An example of third-order random Pushkin, again from Hayes, goes "At oness, and no fall makestic to us".  It's still gibberish, but it is beginning to resemble English.  Push on to fifth-order transition rules and we have "Farewell.  Evgeny loved one, honoured fate by calmly, not yet seeking".  It won't win prizes, but it is recognizably English, and with something of the flavour of the translation of Pushkin on which it was statistically "conditioned".

So what does this have to do with the geology? 

In the methods of spatial analysis commonly used for earth sciences data we usually work with the statistics of pairs of observations.  That is to say, we represent the variability of a quantity by a mathematical model of the correlations between all pairs of observations.  This is similar to the first-order transition probabilities of random Pushkin, which gives a probability for one letter, given the immediate predecessor.  This works for many tasks much of the time, but not for characterizing all the spatial properties of structured variables such as properties of sedimentary rocks with complex depositional history.  In these circumstances the pairwise correlations alone does not capture, for example, the extent to which  those parts of a rock through which water can flow most rapidly are connected to each other.  That connectivity is very important if we are to predict liquid or gas flows through rock.  This is directly comparable to the way in which the first-order transition probabilities fail to capture  features which would make the random Pushkin recognizable as poetry in English.

One active area of research at BGS is in a field called multiple point geostatistics.  One can think of this as directly analogous with the task of generating probability rules of sufficient order to generate random Pushkin which looks like the real thing.  Our motivation is to be able to capture those features of data sets which control how fluids  flow  through a rock, or how contaminants might disperse in soil.

An example of recent research in this area at BGS is this study on random geometrical models for patterns of soil variation generated by ice wedges that formed in tundra conditions during the last ice age. 
 
http://nora.nerc.ac.uk/503392/  Full text will be freely downloadable after 21st July.

Murray

Comments