Saturday, June 6, 2009

A Little (Random) Learning

Kia ora tātou – Hello EveryoneWordle blimp
Fellow blogging companion and friend, Paul C of quoteflections,
gave us the challenge:

For the month of June could anyone so inclined go on an interesting personal journey for good quotations and include some reflections?
Here’s my June contribution.

Suppose you were asked to pick 50 street numbers at random from your local telephone directory. You might have some fun working out a method of selecting them so that, as near as possible, your list showed a chance selection of numbers.

What if you were then asked to sort them by first digit, called the leading digit. You might put all numbers beginning with 1 at top of the list, such as 14, 105, 138, etc. The next set might then be 27, 223, 2901 and so on. Since these were selected in truly random fashion, and given that there are nine possible categories (1 to 9) you might be forgiven for thinking that the pattern of numbers in your list may look something like this:

11, 14, 19, 103, 198, 1613
21, 23, 24, 27, 213, 259,
3, 30, 33, 35, 37, 38, 39, 359
4, 40, 43, 49, 490
5, 50, 54, 57, 57, 502
6, 61, 61, 62, 65, 66
7, 7, 70, 71, 704
82, 85, 836, 840
90, 91, 93, 913, 962


As you can see, the number of numerals in each line is roughly what you might expect, of about 6 in each, give or take the odd one or two allowing for some unevenness due to the smallness of the sample.
It’s the sort of distribution you'd get from a random number generator.

Trial by experiment

You can conduct your own similar experiment. Take today’s newspaper for instance. Scan through the newspaper gathering all the numbers in the text on each page – the financial pages are good for this.

Starting from the left of each number, ignoring the sign, decimal point and any zeros, the first digit you come across is the leading digit. There are 9 possible leading digits. It seems reasonable to expect that 1/9 (or 11.11%) of all numbers would have 1 as the leading digit. However, this is not what’s found in practice.

Here’s what I got when I made a random selection of 50 street numbers from the Wellington White Pages telephone directory:

10, 11, 12, 14, 15, 15, 18, 19, 103, 113, 119, 126, 128, 133, 136, 137, 198, 1613
21, 21, 23, 24, 27, 213, 259,
3, 30, 33, 36, 37, 359
4, 43, 49, 490
5, 50, 54, 502
6, 61, 66
7, 7, 70
82, 85, 840
91, 93


Not exactly what you might predict using the assumption that there would be an equal number of numerals in each category. It’s a fascinating observation that this distribution is common to an infinite number of possible random selections, measurements and totals from collectable data.

A universal law

It is the same sort of distribution that’s found when lengths, in millimetres, of fish are recorded from random samplings, the same scattering that occurs when pebbles are picked
haphazardly from a beach and weighed in grams, and the same pattern that's obtained when the brightness of stars is measured from a random selection in the night sky. Contrary to what you might expect, the leading digits are not evenly distributed in any samples like these.

True random number generators, however, do not produce distributions like this. Herein lies the usefulness of what’s known as Benford’s Law in checking the authenticity of data, such as collected numerical research findings, data measurements where true random sampling is the expected feed, accumulated travel expenses or income tax returns.

Benford’s Law has become a universal tool for fraud detection when checking the genuineness of financial or related data.

How does it work?

Why are the distributions not even as you might expect? One, much simplified way of gaining an understanding of this is in recognising how number distributions occur in the first place. Here’s my simple model for correlating the leading digits of created numbers with their frequency of occurrence.

Let’s suppose that I invested a dollar in a company that returned me a monthly 10% on my investment. I decide to reinvest this return with the company when the total of investment returns and the sum invested reaches whole dollar amounts. So the expected value of my investment would increase by an integral dollar amount each time it’s reinvested: $1.00, $2.00, $3.00 etc.

I keep monthly notes on the total value of my investment. It takes ten months for my original dollar to earn sufficient for the total amount to have reached a reinvestment value of $2. During this time, however, my notebook shows the following monthly value totals:

$1.00, $1.10, $1.20, $1.30, $1.40, $1.50, $1.60, $1.70, $1.80, $1.90

When my notebook shows a value of $2.00, the total amount is reinvested. My notebook then shows these monthly value totals:

$2.00, $2.20, $2.40, $2.60, $2.80

The next set of monthly records in my notebook looks like this:

$3.00, $3.30, $3.60, $3.90

And the next monthly set like this:

$4.00, $4.40, $4.80

At $5.00 my monthly records are:

$5.00, $5.50

And so on. You can see that the pattern of numbers recorded in my notebook is not unlike what’s expected by Benford’s Law. The probabilities of the occurrence of the leading digits are shown in the chart:
Benford's Law Distribution Chart
Benford’s Law has a mathematical side to it, based on sound probability principles outlined in Wikipedia. If you are mathematically inclined and have the time, you may like to check it out.

But the message to anyone who may think they know how to fiddle the books using bogus data from a random number generator or the like is summarised in the quotation:

A little learning is a dangerous thing
Essay on Criticism, Alexander Pope


pen
A Green Pen Society Contribution

( 3 ) ( 2 ) << - related posts

Ngā mihi nui – Best wishes

2 comments:

Paul C said...

A humbling introduction (for me) of an interesting and well explained mathematical principle. I would love to sit in on a math class with you to get the full impact of this lesson.

Yes, a little learning is a dangerous thing, as is a short post for a complex concept.

Thanks, Ken, for the link and taking part in the trial process of an informal future writing club. I will link to you in a post tomorrow.

Blogger In Middle-earth said...

Kia ora e Paul!

Ah, "a short post for a complex concept". I know what you mean. This post is as short as I could write without it being 'dangerous'.

Though the concept is complicated, it's not really complex in the true sense of the word. Looking at the numerical patterns in the rise and fall of Global markets? Now that's complex!

Catchya later