Accuracy, precision and reproducibility. These are the foundations of science that make our progress possible. How do these play into a scientist’s daily activities? And just how precise can we get with our measurements?
Fraser: Welcome to AstronomyCast, our weekly facts-based journey through the Cosmos, where we help you understand not only what we know, but how we know what we know. My name is Fraser Cain; I’m the publisher of Universe Today, and with me is Dr. Pamela Gay, a professor at Southern Illinois University – Edwardsville. Hi, Pamela. How are you doing?
Pamela: I’m doing well. How are you doing, Fraser?
Fraser: Doing very well, very cold, a little sick, so people can hear it in my voice — I’m a little nasal-y today, so I apologize. It should be going away. But now you had a couple of things that we wanted to talk about today. One is you wanted to remind people about the survey that we’re running.
Pamela: Right, so every couple of years we do a general listener survey to find out how we’re impacting you, what you want to learn and exactly why you come and spend spare time listening to the two of us talk astronomy. So if you can spend a few months…spend a few minutes…
Fraser: A few months…
Pamela: [laughing] …answering some questions – spend a few minutes. If you could spend a few minutes answering some survey questions for us, the survey is at http://www.astrosphere.org/Surveys/AstronomyCast/survey_astronomycast.php …we’re just going to put all the links up for you. I’ll be tweeting that in just a minute, and it will be in our shownotes for this episode.
Fraser: OK cool, and then the other thing we wanted to remind everyone is now that it’s 2012 and that the world is ending, we’re going to be on a cruise at the end of the year to celebrate the “end of the world” – not. So, at the very moment that the Mayans predicted…I guess the Mayans ran out of calendar space, we will be cruising around the coast of Mexico, talking about astronomy and celebrating the continuation of the world with all our other skeptical friends, so that’s going to be…David Brin’s going to be on this cruise, there’s going to be astronauts… it’s going to be awesome! We’re going to be there, and we’re going to be doing probably live episodes of AstronomyCast, and you can hang out with us. Play shuffleboard.
Pamela: And all the details are at astrosphere.org, again, and if you sign up, when you talk to the travel agency, tell them you’re part of the AtronomyCast group. We’ll have special freebies and special events just for you on the cruise ship.
Fraser: Yeah, and then we want to know, sort of, how many people are coming for us, so we can sort of accommodate that. Awesome! And know that there’s a limited number of spaces, so it’s one of those things where we’ll keep nagging you until all the spaces are gone, and then we won’t talk about it anymore. OK, and so then one last little piece of work: and so once again we’re recording this episode of AstronomyCast as a live Google plus hang-out. We record these every Monday at noon Pacific, 3:00 Eastern, 8:00 pm Greenwich Mean Time…that’s all the time zones I can think of, so if you want to join us live, we try to sort of get an announcement out ahead of time and then watch the video and participate, ask questions, jump in the hang-out with us, and uh, yeah, it’s sort of the next level of interaction, and big thanks to Google for letting us use this technology. It’s awesome! Alright, well let’s get rolling.
Fraser: Alright, so accuracy, precision and “reproduceability” – these are the foundations of science that make our progress possible, but how do these play into a scientists daily activities, and just how precise can you get with our measurements? And are we going to talk about scientific notation?
Fraser: No? OK.
Pamela: [laughing] No.
Fraser: That stuff was the like the bane of my university existence, but right, so then let’s talk about it. I mean, precision, so where…you proposed this topic this week, so I want to get an idea of where you want to go with this.
Pamela: Well, right now we live in a world where lots of different things are getting discussed about in terms of “is it real?” There’s the “neutrino travels faster than the speed of light” issue, there’s dark energy, dark matter, there’s the Higgs Boson, and the question that people should be asking is “At what level can I believe the things that I’m hearing are true?” And this plays out on so many different things. It plays out also on at what point do you believe a Kepler detection of a planet is real? At what point do you believe the detection of a distant galaxy isn’t just a fluke of your detector on a given day? So as we’re trying to sort out all of these different things, there’s lots of vocabulary that comes into play, and lots of ideas that come into play that we don’t generally have to deal with in everyday life, so I wanted to spend an episode getting into things like, well, what is the different between precision and accuracy? What do you mean you can have a highly precise result that means absolutely nothing? So this is where I wanted to go.
Fraser: So then, I guess, how would you say that scientists define that precision? Like where…how are you going to measure that?
Pamela: So we deal with two basic variables. One is how precise is your measurement? And that basically says, if I take a measurement and I repeat it over and over and over and over, the values are either tightly bundled together, or if it’s not a precise result, they’re spread out a lot. So the example we use when we’re teaching is we’re throwing darts. If you’re a highly precise dart thrower, all your darts are going to land within a half an inch of each other. If you’re an un-precise dart thrower, all of your darts are going to land spread out over two, three meters, maybe, on the wall. You’re taking up the whole wall with your five darts, so precision is how closely spaced are all of your results.
Fraser: And so if you’re, like, doing some kind of scientific research, and you’re looking for some expected outcome, you’re going to want the expected outcome to be precise; otherwise, it’s just going to be random noise.
Pamela: Exactly. And at a certain level, we always start off with a fair amount of noise in our results. As people have worked on trying to define the expansion rate of the Universe, they have gone from plus or minus a few 100 km/second in the early years of trying to make these measurements, to plus or minus a few km/second. So over time, we get our results closer and closer; the error bars on the age of the Universe have gotten smaller and smaller. You always start off with less precise results, and get better and more refined as you go, but precision and accuracy aren’t the same thing, and this is one of the things that we have to worry about when we start looking at things like the “neutrinos moving faster than the speed of light” problem.
Fraser: So I guess in that case, you know, they were very precise in that all of the neutrinos that they detected were moving faster than the speed of light within a very close range; you know, they threw all those darts and they all landed very close to each other, but the question now is are those results accurate?
Pamela: Exactly. So you can imagine the person who throws all of their darts, and they all land within an inch of each other in the very last circle of the dartboard. So they have very precise results that are in entirely the wrong place, and this is one of those problems where….
Fraser: Wrong dartboard…yeah.
Pamela: [laughing] Right, and so this is the type of thing that can come down to…well, I don’t know what it would come down to with the dart thrower, but there have been plenty of experiments that have been done where you don’t realize your meter stick is missing the last three millimeters of the meter stick, you don’t realize your equipment is misaligned by one degree.
Fraser: Right, so you could have something that’s causing noise in your experiment in a very predictable, very…you know, it’s always doing the same thing, and it’s always wrong by the exact same amount.
Pamela: It’s a systematic offset. People who fire guns deal with this when their sites are off, or if you’re playing a video game you may realize “Crud! I always need to click up and to the left 3 pixels in order to actually hit the thing correctly.” So there’s systematic offsets that if you don’t know what they are, you can’t believe any of your results.
Fraser: And there are many situations where it’s those systematic offsets that people didn’t realize they were happening, and then they thought they had accuracy and precision?
Pamela: Well, they keep cropping up throughout all of history. Most people, I think, believe that the “neutrinos are moving faster than light” are actually going to turn out to be some systematic problem with either not understanding the way the Earth is stretching, not understanding the distances, not understanding the GR involved, and the corrections needed to match two clocks in two locations, and so we just look back and we keep finding small things like this. I know there have been cases of “Oh, crud! We forgot to take into account the fact that the Milky Way is moving!” Those sorts of little things add up when you’re trying to figure things out on cosmic scales or even local scales. So it’s the type of thing that we figure it out fast enough that there aren’t too many glaring mistakes.
Fraser: I can think of a good example. Like, remember how Newton made his predictions about the movement of Mercury? And eventually telescopes got better and better and better, and scientists were able to calculate the position of Mercury with great precision, and yet it was always wrong. Matching his theories, right?
Pamela: And that wasn’t so much an error in measurement as an error in understanding.
Fraser: Right, in prediction, yeah.
Pamela: Yeah. We have to worry about two different things. It’s the systematic offsets in our measuring, which is where we worry about the neutrinos, but sometimes we’re just missing a term in our equations, and that doesn’t so much go into precision and accuracy as we just missed a term in our theory, so that’s…different bin.
Fraser: Now, there was a really good example that’s come up quite recently with the possible detection of the Higgs Boson by Cern. They sort of talked a lot and sort of, you know…was it Sigma? Degrees of Sigma?
Pamela: How many…yeah. So this starts to get into noise theory. Any time we make a measurement, there’s going to be some sort of inherent noise in it. There’s going to be…if we’re trying to detect light, there’s just this constant steady stream of photons at all colors that are creating this noisy background. There’s going to be just minor fluctuations when you try and make measurements with a ruler, measurements with a laser, measurements with any tool, and if the noise is truly random, what you should end up with is all of these different variables, all of the different ways that things can go wrong, if they’re random, work out to form what’s called the Gaussian Distribution, a bell curve, a normal distribution, such that the majority of the measurements are going to be pretty close to the same value. And this means that if you take all of your points and you plot them, you’ll end up with plus or minus 34.1% of your values, so 34.1% that are too high, and 34.1% are too low – those count as one Sigma off of accurate. So you end up with a curve where one Sigma is plus or minus 34.1% of your values. Then you end up with another “Sigma” is the term we use – I know it’s all quite confusing. So it’s a bell curve, go plus or minus 34.1%, and that’s one Sigma. Now I know this is hard to understand, so for 3 Sigma, instead of giving you the percentages that we’re looking at, it’s 1 in 370 of your values, so if you take a whole bunch of observations, 1 in 370 of them is going to fall 3 Sigma away from your actual value. So once you start to get to that level of “Wow! That probably shouldn’t have happened.” You start to think “maybe this is something other than the measurement I was trying for. Maybe that’s something above the noise in my values.” So with the Higgs Boson, if you’re looking in a chamber for a particle to randomly come out of the energy of what they’re colliding in the detector, there’s a random chance that something’s just going to happen, and then there’s the “you see it happening more often than random would predict,” and if you see it happening at the 1 in 370 level, that’s a 3 Sigma detection.
Fraser: And didn’t they, I mean, they said…or that maps over to something like a… What was it? Like a 99% possibility?
Pamela: Right, of it actually being real. Now the thing with the Higgs Boson detections is they haven’t actually gotten up to 3 Sigma yet. They’re looking at like almost 2 Sigma detections, and in reality, what we actually really hope for is a 6 Sigma detection. This is where you have a 1 in 507 million of the thing occurring randomly. So once you start to get to something is that rare in random distribution, you start to say, “Huh! Maybe that’s real.”
Fraser: Or it was…remember last week we were talking about the confirmation of the multiverse theory, right? It happened to exist in a universe where all the particles are lining up in the wrong place. But right, it’s funny, if you got…you know, if I’m 99% right, then that’s pretty right, but it’s quite surprising — physicists are so reserving of their judgment that they need a 99.99999…I forget how many nines to be really comfortable that they found the Higgs Boson, and they’ll announce it. And you know, I think when physicists say, “We’ve found a particle,” they are serious.
Pamela: Well, and the thing with the Higgs Boson that I’ve really enjoyed watching is they’re not only doing one experiment over and over at a variety of different energies, trying to prove that it exists with that one detector at that one set of energies, but they’ve actually have a whole variety of different experiments going on, and what’s kind of awesome is through using the Atlas experiment, the CMS experiment, the D-0 group, all of these different experiments that all do very different things that are for an entirely different show — what they’re doing is they’re systematically, with all the experiments, ruling out some energy levels, so they’ve pretty much completely ruled out everything in a higher energy than 128 giga-electron volts/c squared, which is just a number – deal with it. But with all these experiments, they’ve ruled out that region, and with both Atlas and CMS, they’ve managed to say, “there seems to be something going on between 115 and 127 giga-electron volts/c squared” – again just a set of numbers, but two different experiments using two different methodologies have come at the same range of “Huh! There might be something here.” Now the problem is the “it might be something here” is such a weak detection that it’s hard to say it isn’t just something random that’s in the background noise of, well, what the Universe is constantly doing. There’s constantly particles coming in and out of existence. There’s lots of stuff going on at every single moment, and it’s hard to know — are they just detecting the tail-end of the Universe, or an actual detection of the Higgs Boson?
Fraser: So is that an accuracy problem? You’ve got noise coming in that’s pushing everything in the wrong direction?
Pamela: So accuracy and precision refer to your data. Is it…are all of your data coming in at the same energy level? So the range of 115-127 describes the precision of the measurement. Accuracy says whether or not it’s true, but neither of those reflects on the noise. The noise is, well, just how well can you detect this? So it’s sort of like saying you are throwing darts in the dark, and you’re trying to figure out while feeling your board if you actually managed to get everything in the right place. So it’s…you’re just not quite sure what happened because you can’t see it well enough, so this is getting above the noise issues.
Fraser: Right, so what is the method, then, that scientists use to manage their precision, right? And their accuracy? What do they do to, sort of, really in a way that other scientists would say, “Yes, we agree. You’ve found the Higgs Boson.”
Pamela: So here it’s a three-pronged problem. You have accuracy of your results, and that comes from, basically, narrowing a window down and proving nothing’s outside of that window. All of the detections we have are within that window. Precision is getting the window smaller and smaller and smaller, and then there’s the “seeing it above the noise.” The “seeing it above the noise,” that’s the seeing in the dark problem, and to get to that, you just have to build more and more sensitive detectors, so in this case, it’s things like with Atlas, the fiber optics they’re using… I actually worked on building this instrument when I was an undergrad. Didn’t know at all what I was doing; I was just weaving fibers…I was an undergrad.
Pamela: So I was working on it when I was at Michigan State, and one of the things that we had to do was mix optically-perfect epoxy that didn’t have any bubbles in it, and every single fiber was checked to make sure that it had the expected light through-put, so if you put a set amount of light in one end, you get the set amount of light out the other end, and any fiber that didn’t meet “spec” got thrown out, and we had to redo that entire fiber optics assembly. So with things like Atlas, they work very hard to make sure they understand exactly how sensitive the system is, and then make it absolutely as sensitive as possible — basically, the “no photon goes unmeasured” type of a set up.
Fraser: And, I guess, the same thing with the accuracy, right? Which is that, you know, you need to make sure that every piece of your experiment you’re performing as expected to.
Pamela: Right, right, so when you say something is at 114 giga-electron volts/c squared, you actually mean 114, not 122. When you say 115, you don’t actually mean 123, so that’s where the precision comes in is making sure you’re actually… I mean, that’s where the accuracy comes in is you actually know where you are, and the precision is “When I detect it, I’m certain it’s where I think it is.”
Fraser: And the other big concept that we mentioned at the beginning of this show is “reproduceability,” so how does that come into play with science?
Pamela: This is the “neutrino faster than light” problem, where they said, “OK, can somebody else in the world reproduce this?” Because you never know what is a problem with your system. When they first detected the cosmic microwave background, they blamed their equipment, not the Universe. They assumed it was something wrong with their set-up, pigeon poop, something… and so not only did they do everything they could to make sure that it wasn’t noise created by their electronics, but they also went out and they said, “OK, can someone else reproduce this?” When supernovae are detected, the first thing that people do is they put out a call: “Can someone else detect this with their detector?” And it’s by having multiple instruments reproducing the same results where you get your first confirmation. Now when it comes to ideas that change our understanding of the Universe, however, it’s not just enough to say this one experiment has been reproduced with this telescope, this telescope, that telescope, or multiple cyclotrons, or the same type of experiment at multiple institutions. You actually have to come up with complementary experiments, so when we say the Universe is filled with dark matter, we base that bold statement on the fact that we see things rotating at speeds that can only be explained by there being more stuff out there than we can observe. We base it on seeing lensing of distant galaxies — microlensing and macrolensing events of different types. We base it on looking at the cosmic microwave background and at the assembly of galaxies. All of these different lines of evidence go into dark matter. The same thing with the Big Bang — there’s multiple lines of evidence.
Fraser: And eventually we’ll be able to have a “cup of Universe” and sort it by regular matter, and dark energy, and dark matter, and actually have detected it, and know, you know, we also know this is dark matter because we could detect it here with our instruments. And that’s the piece of the puzzle that’s still being worked on. That’s kind of like the next big job.
Pamela: And this is again where we’re starting to see experiments that are making claims that they’re detecting a particle, but we don’t know if that’s really above the noise, we don’t really know “Can you trust that?” And so this is where the repeatability comes into the experiments.
Fraser: So then, where do you think people are, like, hearing and discovering new theories, and hearing them talked about on the internet, either fairly mainstream stuff like the Higgs Boson and you know other dark matter experiments and things like that, but also some of the more interesting theories like the discovery of “faster than light neutrinos?” I mean, who wouldn’t love faster-than-light travel? So lot of this stuff gets announced. When a lay person hears that kind of stuff, what sort of filter should they use, based on what we’ve been talking about today, as a way to put everything in perspective?
Pamela: So any time you hear an announcement that says that it’s going to fundamentally change our understanding of the Universe, you need to ask, “Has this been repeated?” That should always be your first question is “Has this been repeated?” And when I say repeated, I mean by somebody else. So when they first announced dark energy back in 1998, what was so amazing is there were two competing teams that didn’t particularly like one another because they were competing for financial resources. These two different competing teams both came up with the exact same result using supernovae to say our Universe isn’t behaving the way we thought, and it’s not behaving the way we thought for both sets of data in the exact same way; therefore, we have something we’re going to call dark energy. Since then we’ve been able to add more and more credence to that idea by looking at the cosmic microwave background, by looking again at models for how large-scale structures formed over time. So the first thing you ask is “Has it been proven by more than one experiment?” If it’s been proven by more than one group of people doing the exact same experiment, it’s fair to go, “Huh, I can start to think that’s true.” The next thing they should ask is “Well, OK, so it was done by more than one place, but how good was the detection?” So there was, back in 2010, some…what that looked like for about one month, really good evidence for a 2 Sigma detection of the Higgs Boson, where both Fermi Lab and Cern came up with similar results and then realized “No, that was the Universe, not a detection of the Higgs Boson, just background noise that always sits there.” So then you have multiple experiments, and then you have high threshold above the noise so you’re certain there’s something actually there. Once you know there’s something actually there, and it’s repeatedly actually there, and then you start saying, “OK, so can I find another way to detect the same phenomena? Is there another different type of experiment that somebody else has done?” So this is where you get the multiple lines of evidence all proving the same thing.
Fraser: And will scientists typically speak in this “Sigma” way of describing…
Pamela: Oh, yeah.
Fraser: Oh, really?
Pamela: Yeah, we’re really lazy. We like to say it’s a 6 Sigma result, it’s a 2 Sigma result, and so yeah, we’re pretty bad about that.
Fraser: Right, but it’s like 5 stars, 2 stars…
Pamela: Exactly. That’s exactly what it is. Six Sigma is “Wow! OK, I can trust this number!” But you also have to be careful…so sometimes we lie with numbers, we don’t mean to, but we do, so you can have a 6 Sigma detection above background of a bright object, but all because you have a 6 Sigma detection doesn’t mean it is what you think it is. This is where the whole supernova problem comes in. You can have a 6 Sigma bright detection on your image, and what you’ve actually detected is something totally different from a supernova – a cosmic ray hit your detector, airplane flash on your detector, all sorts of different things could have caused a 6 Sigma detection of something radically different than what you claimed it was.
Fraser: And so then based on that, you know, this sort of reliance on 6 Sigma and, etc…what is a way that you could sort of decide if something is sort of complete and total nonsense, right? Just by hearing the vaguest hint of it, you know, what is the thing that a scientist depends on, which a pseudo-scientist will avoid at all cost?
Pamela: If the person doesn’t have error bars on their measurement, they probably don’t know how accurate or precise their measurement is, so you need to have error bars. The experiment needs to have been done more than once, and all because something’s published, doesn’t mean it’s been done more than once. The research journals are completely filled with experiments that were done once, written up, published, and never repeated again, or when they were repeated, didn’t get results, and therefore the null results that came in the future were never published because you don’t publish null results.
Fraser: [missing audio] You should.
Pamela: I know you should, but that doesn’t mean people do. So you need error bars and you need repeatability.
Fraser: So if you could, like, ask some pseudo-scientist, “Let’s see your error bars.”
Fraser: If there are no error bars show up… OK, so let’s say they do have a working understanding of statistics, and they’re able to provide you with some error bars, what would you want to ask for next?
Pamela: Well, then you’ll have to start critiquing their experiment. You have to start saying, “OK, did you take into account this? Did you take into account this? Did you take into account this?” The first…what we thought was a detection of a planet going around a pulsar turned out to actually be just forgetting to take into account the Doppler shift of the planet Earth, and so you have to be very careful in how you design your experiments. And if someone has what sounds a like a phenomenal result, you have to make sure they didn’t just rediscover a systematic effect that was there and had nothing to do with what they thought it was.
Fraser: And is that the real value of the peer review process is to look for those error bars for starters, but then to, you know, hammer on each piece of the experiment to find out what’s right and what perhaps could have introduced some error?
Pamela: At the end of the day, none of us remember everything every time, and this is where collaborators help keep you honest. If you have a good collaborator, they’re going to go, “Did you think of…? Did you think of…?” And it’s a harrowing experience if you have a really good collaborator because they will basically beat you up to make sure you thought of everything, but you want a collaborator like that, first because they’re friendly, and then once you and your collaborators have gone through the “Did you think of…? Did you think of…?” peer review should do the next step of “Did you think of…? Did you think of…” and only if you can answer yes, or answer, “This is how it would effect it” to all those “Did you think of’s.” Peer review should help that happen and that’s where good science comes from is that dialogue of “Did you think…?”
Fraser: Well, I think that about wraps up this week. Thanks a lot, Pamela.
Pamela: My pleasure.
This transcript is not an exact match to the audio file. It has been edited for clarity.