# Monte Carlo Integration(教程,scratchapixel,编程语言)

• TAG :

https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice/monte-carlo-integration

if u understand and know about the most importance concepts of probability and statistics in we introduced in lesson 16, u will see that understanding monte carlo integration is incredibly simple. this method is used a lot in rendering (ray tracing in particular) and shading. in this chapter we will speak about the theory and in the next chapter we will actually study a practical example. to understand how MC integration is used in rendering, u first need to know about the rendering equation (which is the topic of the next lesson). we will then show how the method is used in the following lesson (introduction to light transport).

monte carlo estimator: evaluating integrals
before u start reading this chapter, it is important that u understand the law of the unconscious statistician which we explained in this chapter from lesson 16. it states that the expected value of a function of a random variable f(x) can be defined as : where Px is the probability distribution of the random variable X. this is hopefully something u understand well. if u do not, we strongly recommend that u carefully read the chapter which is devoted to this chapter. You need to be perfectluy comfortable with this idea to understand Monte Carlo integration.

At this point you should also be familiar with the concept of variance and standard deviation which we won’t talk about here (if you don’t you will find them explained in lesson 16). However, as a quick reminder, recall that variance can be defined in two equivalent ways (the second is just slightly more convenient): We will be using these formulas further down, so it also important that you understand them (they are explained in this chapter from lesson 16).

The principle of a basic Monte Carlo estimation is this: imagine that we want to integrate a one-dimensional function f(x) from a to b such as: As you may remember, the integral of a function f(x) can be interpreted as calculating the area below the function’s curve. This idea is illustrated in figure 1. Figure 1: the integral over the domain [a,b] can be seen as the area under the curve.

now image that we just pick up a randome value, say x in the range [a,b], evaluate the function f(x) at x and multiply the result by (b-a). figure 2 shows what the result looks like: it is another rectangle (where f(x) is the height of that rectangle and (b-a) its width), which in a way u can also look at a very crude approximation of the area under the curve. of course we maybe get it more or less right. If we evaluate the function at x1 (figure 3) we quite drastically underestimate this area. if we evaluate the function at x2, we over estimate the area. But as we keep evaluating the function at different random points between a and b, adding up the area of the rectangles and averaging the sum, the resulting number gets closer and closer to the actual result of the integral.
It’s not surprising in a way as the rectangles which are too large compensate for the rectangles which are too small. And in fact, we will soon give the proof that summing them up and averaging their areas actually converges to the integral “area” as the number of samples used in the calculation increases. This idea is illustrated in the following figure. The function was evaluated in four different locations. The result of the function as these four values of x randomly chosen, are then multiplied by (b-a), summed up and averaged (we divide the sum by 4). The result can be considered as an approximation of the actual integral.

of course, as usual with monte carlo methods, this approximation converges to the integral result as the number of rectangles or samples used increases.

We can formalize this idea with the following formula: Where N here, is the number of samples used in this approximation. In mathematical notation (and statistics), ⟨S⟩⟨S⟩ represents the average of all the elements in S (⟨FN⟩⟨FN⟩ is an approximation of F using N samples. It is equivalent to the sample mean notation X¯nX¯n we used in lesson 16 and the two are actually equivalent). This equation is called a basic Monte Carlo estimator. The random point in the interval [a,b] can easily be obtained by multiplying the result of a random generator producing uniformly distributed numbers in the interval [0,1] with (b-a):
Xi=a+ξ(b−a)Xi=a+ξ(b−a) , where ξξ is uniformly distributed between zero and one.

The PDF of the resulting XiXis is 1/(b−a)1/(b−a). Since the random numbers are produced with equiprobability (each number is produced with the same probability than the others), we just divide 1 by the total number of outcomes as in the case of a dice. However in this example, the function is continuous (as opposed to discrete), so we divide 1 by the interval [a,b].

It is important here to note that: The law of large numbers which we talked in lesson 16, tells us that as N approaches infinity, our Monte Carlo approximation converges (in probabiliy) to the right answer (the probabiliy is 1).

Note also that ⟨FN⟩⟨FN⟩ is a random variable, since it’s actually made up of a sum of random numbers. We can now proof that the expected value of ⟨FN⟩⟨FN⟩ is equal to F: Remember that the pdf is equal to 1/(b-a) thus it cancels out the term (b-a) on the right inside of the integral sign (line 3).

Generalization to Arbitrary PDF
Now, as mentioned above, the formula we used for the Monte Carlo estimator is basic. Why? Because it only works if the PDF of the random variable X is uniform. However, we can extend Monte Carlo integration to random variables with arbitry PDFs. The more generic formula is then: this is the more generalized form of the monte carlo estimator, and the one u should remember (if there is only one equation to remember from the last two chapters, it is the one).

To be clear, the pdf in the denominator is the same as the pdf of the random variable X.

as with the basic monte carlo estimator, to be sure that this formula is valid, we need to check that this estimator has the correct expected value. let us check: The trick here, is in the substitution of (line 2): for (line 3): on the third line (and not to forget to divide f(x) by pdf(x)). Take the time to understand these equations. As we just said, this is the most important result of everything we have studied so far, and is the backbone of almost every algorithm we are going to study in the next lessons. If you don’t understand this algorithm, you won’t understand monte carlo ray tracing. With the rendering equation this is probably the second most important equation. Now, you will ask why would I ever want to draw samples from any other distribution than a uniform distribution? And that would be a very good question. The weak answer is “because maybe you can only use a given random generator to produce samples and that this generator has a non-uniform PDF”. Thus, at least, if that’s the case, we just demonstrated that you can still use a Monte Carlo integration, as long as you don’t forget to divide f(Xi)f(Xi) by pdf(Xi)pdf(Xi). But you will also see that this result will become handy when we will study variance reduction in the next chapter. So keep reading and you will soon understand why this result is important!

Let’s get an intuition as to why dividing f(x) by pdf(x) is necessary (for non-constant PDFs). As you know, the pdf gives the probability that a random variable X get some value xixi. When you draw samples from an arbitrary PDF, samples aren’t uniformly distributed: more samples are generated where the PDF is high and reversely, fewer samples are generated where the PDF is low (see adjacent figure). In a monte carlo integration though, the samples need to be uniformly distributed. If you generate a high concentration of samples in some region of the function (because the PDF is high in this region), the result of the Monte Carlo integration will be clearly biased. Dividing f(x) by pdf(x) though will counterbalance this effect. Indeed, when the pdf is high (which is also where more samples are generated) dividing f(x) by pdf(x) will decrase the “weight” of these samples in the sum. So in essence, we compensate for the higher concentration of samples, by weighting down their contribution. On the other hand, when the pdf is low, fewer samples are generated (the density of samples is lower than in region where the pdf is high). But if we divide f(x) by a low value (1 divided by 0.1 for instance), the contribution of these samples is scaled up. We compensate for the lesser density of samples by giving them more weight. That’s in essence, what the division of f(x) by pdf(x) does. It weights the samples’ contribution to compensate for the fact that samples generated from an arbitrary PDF won’t be uniformly distributed (assuming the PDF is not constant of course): it scales down samples generated in regions of higher density (where the PDF is high), and scales up samples generared in regions of lesser density (where the PDF is low).

We look into this concept in depth in the next chapter on variance reduction and importance sampling. 