Slippery Statistics and Fantastic Facts

Much has been said about statistical mathematics in general and statisticians in particular, not all of it complimentary. Mark Twain is storied to have claimed: “There are three kinds of liars – liars, damn liars and statisticians.” Clearly Mr. Twain had a low opinion of the utility of statistical mathematics, assuming he recognized “statistics” as a mathematical science at all.

In the early nineteenth century LaPlace with his work on probability, and Gauss and Quetelet with their work on normal distributions paved the way for statistical mathematics. “Statistics” as we know it today hit its stride in about 1860 when Sir Francis Galton formulated the concept of “statistical thinking”. Galton demonstrated that within all populations in nature (not acted upon by outside influence), the attributes of each individual follow a fixed and predictable relationship with the attributes of every other. That is, if the value of an attribute (like the number of spots on a giraffe) is plotted on the horizontal axis of a graph and the corresponding frequency of occurrence of that value in the population (the number of giraffes in the herd with that number of spots) is plotted on the vertical axis, the resulting curve is the well-known “bell” or “Gaussian” curve. Every time. Without fail. (Sir Francis used the lengths of pea pods to demonstrate his point, but that’s another story.)

Because of this invariability, statistical mathematicians can predict the entire curve knowing the values of only a few random samples from the population. For example, mathematicians have developed statistical inference to such a high degree that today much of the manufacturing in the world is monitored using Statistical Process Control (SPC); a method of monitoring manufacturing output by evaluating small numbers of samples from the total production population. SPC can, for example, predict exactly when tool wear will cause the production of out of tolerance product thereby ensuring tool replacement before reject product is made.

Statistical mathematics is a very precise science. Statistical inference can not only predict an entire population but also state the exact confidence level of that prediction all based on data from a relatively small number of random samples. Every time. It is not possible to “manipulate” statistical inference. The data is the data. The calculations are the calculations and therefore the results are the results. The data goes in and the results come out. There is no way to alter the process.

Yet daily we see presentations of “statistics” that seem to fly in the face of reality. How can that be? The answer is in THE DATA. While is not possible to manipulate statistical inference, it is possible to give the process bad input. The classic “Garbage In, Garbage Out” scenario. The only way to “manipulate” statistical results is to manipulate the data.

Which brings us to “The Scientific Method”, a much vaunted/maligned process. Few outside the Pure Sciences understand its use and some within the scientific community either don’t understand it or willfully abuse it. Simply put, the Scientific Method holds that a tentative conclusion stands so long as there is data in support of it, and more importantly, there is no data refuting it. What does that mean? It means that all scientific “facts” are tentative – forever. They are always subject to the presentation of new or additional data that refutes them. They are all “Just Theories”.

An example: Many years ago it was an accepted scientific fact that the earth was flat. All available data supported the concept: an observer standing on the seashore could see the edge. Even so, Astronomers often came to doubt the theory because of the observed sphericity of everything else in the known universe. People died as heretics for presenting data that refuted conventional wisdom, but the scientific method eventually won out and a round earth emerged from the dispute. In more recent times, the notion of a spherical earth has been replaced with a somewhat egg-shaped one. The scientific method prevails.

The scientific method can’t work, however, unless all of the data is included in the evaluation of a theory. If only the “good” data is included and the “bad” data excluded, the validity of the theory is decided outside the realm of the scientific method and although it could be true, is suspect.

Another example: Scientists at Texas Instruments were evaluating some new Gallium Arsenide semiconductor diodes that had been designed for use in high frequency electronic circuits. One of the attributes they studied was the efficiency with which the diodes conducted electrical energy from their inputs to their outputs, theorizing that lost electrical energy would be converted to heat energy. They devised an experiment which precisely measured the electrical energy put into the diode, the electrical energy returned from the diode and the heat produced by the diode.

Then they gathered data. When they looked at their experimental results, they found that the heat energy produced was less than the difference between the electrical energy in and the electrical energy out! Bad data? No, data is data. Bad experiment? Possibly, but repeated checks of their equipment revealed no flaws. Bad theory? Very likely. But if the lost electrical energy hadn’t become heat, where did it go? Another form? – – motion? No – – sound? No – – light? – – maybe. Further checks confirmed that the missing energy was emitted as light. Thanks to the scientific diligence and persistence of those experimenters, Light Emitting Diodes are now ubiquitous in our world. (A horrible thought: Had they applied their “statistics” in a more popularly accepted manner, we might be struggling through life today without a TV remote!!)

The popular notion of statistics bears little, if any, similarity to the definition given by the scientific/mathematics community. Data is routinely culled and manipulated by special interest groups, politicians and editorialists masquerading as journalists. Often, the stories presented as a result are just that: stories. Isn’t it sad that in today’s world of web communication where all the data is easily available to everyone, that the loudest voices remain the posturing and blustering pseudo intellectuals for whom facts forever remain malleable?