Today, I wanted to publish a post on generating functions, based on discussions I had with Jean-Francois while having our coffee after lunch a couple of times already. The other reason is that I publish my post while my student just finished their Probability exam (and there were a few questions on generating functions).

**A short introduction (back on a specific exercise)**

In the Probability exam, I included an exercise we’ve seen in class, last week. The question is the following (question 16 in the form – in French). Let for and for be the cumulative distribution function of some random variable , i.e. . What is the moment generating function of , i.e. ?

Consider some (we’ll see later on if some additional constraint are necessary). The tricky part of this exercice appears extremely fast, actually: how could you write ? I mean, in any probability textbook, the standard answer is

- if is discrete,

- if is (absolutely) continuous,

where is the density of . Here, is clearly not a discrete variable. But is it (absolutely) continuous. My (strong) belief is that you need to plot that distribution function to see how it looks like, , for all

(following recent discussions with Philippe Reka, I will try to post more hand-made graphs)

Ooops. It looks like we have a discontinuity in 0. So we have to be a bit carefull here : is neither continuous nor discrete. Let us use the double projection formula,

which can also be writen, if ,

This is simply the idea of saying that the overall average is a barycenter of the average per subgroup. Here, and let while (note that ). Thus,

Let us consider the three different components.

and

(since it is is a real-valued constant), and here . So finally, we should compute . Observe that given is a (absolutely) continuous random variable, with a density. To get it, observe that for all ,

and , i.e. given is an exponential distribution.

Hence, is a mixture between an exponential variable and a Dirac mass in . This was actually the tricky part of the question since it is not obvious when we see (only) the formula above.

From now on, it is just high-school level computations,

if (for the first time, we see that the function is not defined everywhere). If we put all the expressions together,

**Monte Carlo computations**

If we are lazy (and trust me, I am extremely lazy), it is possible to use Monte Carlo simulations to compute that function,

> F=function(x) ifelse(x<0,0,1-exp(-x)/3) > Finv=function(u) uniroot(function(x) F(x)-u,c(-1e-9,1e4))$root

or (to avoid the problem of the discontinuity)

> Finv=function(u) ifelse(3*u>1,0,uniroot(function(x) + F(x)-u,c(-1e-9,1e4))$root))

Here, the inverse is simple to get, so we can faster the code using

> Finv=function(u) ifelse(3*u>1,0,-log(3*u))

Then, we use

> rF=function(n) Vectorize(Finv)(runif(n)) > M=function(t,n=10000) mean(exp(t*rF(n))) > Mtheo=function(t) (3-2*t)/(3-3*t) > u=seq(-2,1 ,by=.1) > v=Vectorize(M)(u) > plot(u,v,type="b",col='blue') > lines(u,Mtheo(u),col="red")

The problem with Monte Carlo simulations is that they should be used only if they are valid. If mean, I can compute

> set.seed(1) > M(3) [1] 5748134

Finite sum can always be computed, numerically. Even if here, does not exist (or to be more precise, is not finite). It is like the average of a Cauhy sample… I can always compute it, even if the expected value does not exists…

> set.seed(1) > mean(rcauchy(1000000)) [1] 0.006069028

This is related to questions I tried to ask a few years ago in a paper, where I wanted to test if (or not). Almost all the tests I know are actually based on that assumption… But this is not the point here. My point is that those generating functions are interesting, when then exist. And perhaps working with characteristic function is a better idea.

**Generating functions**

Now, to get back on the begining of last course, generating functions are interesting for a lot of reasons. But first of all, let us define those function properly.

The moment generating function exists if it is finite on a neighbourhood of (there is an such that for all , ). In that case, there exists some (open) interval such that for all , , called the convergence strip of the moment generating function.

This function is said to be moment generating, since if exists (as defined in the previous paragraph), then all moments exist, for all , . This is basically due to the fact that, for all , as , so, for all large enough, . And before, it is always possible to use a multiplicative constant,

for some . Thus,

if is small enough (namely belongs to the convergence strip).

Now, if we use Taylor’s expansion,

and

If we look at the value of the derivative of that function at point 0, then

As we’ve seen last week in class, it is possible to define a moment generating function in higher dimension, for some random vector ,

for some . It is again a moment generating function since crossed derivatives (taken a point ) are cross-moments. For instance,

Some, moment generating functions are interesting if you want to derive moments of a given distribution. Another interesting feature is that this moment generating function (under certain conditions) fully characterize the distribution of the random variable, in the sense that if for some ,

for all , then .

**From moment generating functions to characteristic functions**

The problem with the moment generating function is that the function is defined (only) on some neighborhood of , and we should be careful. The other problem is that it does exist only for distribution in . Which might be a strong assumption.

Thus, an interesting idea is to consider not on the real line, but on the imaginary line.

Thus, let for some . Actually, not some, but all , since

so the characteristic function always exists. Paul Lévy proved in 1925 that the characteristic function completely characterizes the distribution.

Now, if we look at it quickly, it looks like we did not change a lot of things here, and we should be able to write

If we want to do things properly, let us look at Gut (2005) for instance. Assume that is defined on some interval . It is then possible to define a function (this time, it is no longer a real-valued function) as

which is well defined on some strip .

and are then restriction of that function respectively on the imaginary line, and the real line. That function is clearly holomorphic, and thus, the value it takes on such a strip is fully determined by the values it takes on the real interval . Thus, the moment generating function will completely characterize the distribution.

But it has to be defined on some neighbourhood of . Which is not trivial actually… I mean, I nonlife insurance, we see a lot a Pareto distributions.

**Fast Fourier Transform**

Recall Euler’s formula,

Thus, we should not be surprised to see Fourier’s transform. From this formula, we can write

Using some results in Fourier analysis, we can prove that probability function satisfies (if the random variable has a Dirac mass in x)

which can also be written,

And a similar relationship can be obtained if the distribution is absolutely continuous at point ,

Actually, since we work with real-valued random variables, the complex area was just a detour, and we can prove that actually,

It is then possible to get the cumulative distribution function using Gil-Peleaz’s inversion formula, obtained in 1951,

Nice isn’t it. It means, anyone working on financial markets know those formulas, used to price options (see Carr & Madan (1999) for instance). And the good thing is that any mathematical or statistical software can be used to compute those formulas.

**Characteristic function and actuarial science**

Now, what is the interest of all that in actuarial science ? Characteristic functions are interesting when we deal with sums of independent random variables, since the characteristic function of the sum is simple the product of the characteristic functions. They are also interesting when dealing with compound sums^{1}. Consider the problem of computing the 99.5% quantile of the compound sum of Gamma random variable, i.e.

where are i.i.d. and . The strategy is to discretize the loss amounts,

> n <- 2^20; > p <- diff(pgamma(0:n-.5,alpha,beta))

Then, the code to compute , we use

> f <- Re(fft(exp(lambda*(fft(p)-1)),inverse=TRUE))/n

To compute the 99.5% quantile, we just use

> sum(cumsum(f)<.995)

That’s extremely simple, isn’it. Want me to do it for real ? Consider the following losses amounts

> set.seed(1) > X <- rexp(200,rate=1/100) > print(X[1:5]) [1] 75.51818 118.16428 14.57067 13.97953 43.60686

Let us fit a gamma distribution. We can use

> fitdistr(X,"gamma") shape rate 1.309020256 0.013090411 (0.117430137) (0.001419982)

or

> f <- function(x) log(x)-digamma(x)-log(mean(X))+mean(log(X)) > alpha <- uniroot(f,c(1e-8,1e8))$root > beta <- alpha/mean(X) > alpha [1] 1.308995 > beta [1] 0.01309016

Whatever, we have the parameters of our Gamma distribution for individual losses. And assume that the mean of the Poisson counting variable is

> lambda <- 100

Again, it is possible to use monte carlo simulations, if we can easily generate a compound sum. We can use the following generic code: first we need functions to generate the two kinds of variables of interest,

> rN.P <- function(n) rpois(n,lambda) > rX.G <- function(n) rgamma(n,alpha,beta)

then, we can use (see here for a discussion on possible codes)

> rcpd4 <- function(n,rN=rN.P,rX=rX.G){ + return(sapply(rN(n), function(x) sum(rX(x))))}

If we generate one million variables, we can get an estimator for the quantile,

> set.seed(1) > quantile(rcpd4(1e6),.995) 99.5% 13651.64

Another idea is to remember a proporty of the Gamma distribution: a sum of independent Gamma distributions is still Gamma (with additional assumptions on the parameters, but here we consider identical Gamma distributions). Thus, it is possible to compute the cumulative distribution function of the compound sum,

> F <- function(x,lambda=100,nmax=1000) {n <- 0:nmax + sum(pgamma(x,n*alpha,beta)*dpois(n,lambda))}

(or at least a approximation). If we invert that function, we get our quantile

> uniroot(function(x) F(x)-.995,c(1e-8,1e8))$root [1] 13654.43

Which is consistent with our monte carlo computation. Now, we can also use fast Fourier transform here,

> n <- 2^20; lambda <- 100 > p <- diff(pgamma(0:n-.5,alpha,beta)) > f <- Re(fft(exp(lambda*(fft(p)-1)),inverse=TRUE))/n

> sum(cumsum(f)<.995) [1] 13654

Now, if it is simple, is it efficient ? Let us compare for instance computation time to get those three outputs,

> system.time(quantile(rcpd4(1e5),.995)) user system elapsed 2.453 0.106 2.611 > system.time(uniroot(function(x) F(x)-.995,c(1e-8,1e8))$root) user system elapsed 0.041 0.012 0.361 > system.time(sum(cumsum(Re(fft(exp(lambda*(fft(p)-1)),inverse=TRUE))/n)<.995)) user system elapsed 0.527 0.020 0.560

Computations here are comparable with the (numerical) inversion of the cumulative distribution function. Except that here, we were lucky: if the distribution is not Gamma but log normal, the second algorithm cannot be used.

^{1. This numerical example is taken from the first chapter of Computational Actuarial Science with R, to appear in a few months.}