ramen and pickles

science, technology, and medicine served up with some tasty noodles

Monthly Archives: January 2013

Shifting sands: Tracking down errors in R

Excellent overview on tracking down errors in R code from Pete Werner.

 

 

Shamelessly copying his post content below without permission:

 

 

TUESDAY, JANUARY 29, 2013

Tracking down errors in R

 

It’s that moment we all know and love, somewhere in our code something has gone wrong. We think we have done everything right, but instead of expected glory we find only terse red text lain below our lintel.

 

This can be very frustrating, and trouble shooting these issues can often be very time consuming.

 

All is not lost. There are a few bits of R that can greatly help finding out what exactly has gone wrong and where, which in turn should suggest a reasonable course of action.

 

First we will look at some simple methods we can use to track down issues, namely the warn option and traceback, and then we will look at stepping through functions with debug.

 

First Steps

 

I’m going to use an example that has stuck with me from when I first started using R, using neural nets for classification with the iris data.

 

Let’s take a look at the error:

 

library(nnet)
X <- iris[,1:4]
Y <- iris[,5]
mod <- nnet(X, Y, size=2)

 

# weights: 19
Error in nnet.default(X, Y, size = 2) : 
  NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In nnet.default(X, Y, size = 2) : NAs introduced by coercion

 

Urgh. Really? What the hell does that mean?

 

We can use the built in traceback() function to see where this error occured

 

> traceback()
2: nnet.default(X, Y, size = 2)
1: nnet(X, Y, size = 2)

 

We can see our call to nnet(), which in turn has called nnet.default() and which is where our error has come from.

 

In the error output, we can see there was also a warning “NAs introduced by coercion”. As we weren’t expecting any warnings, let’s track down that, as errors tend to compound.

 

Warnings

 

To find out where that message was coming from, we will use options(warn = 2) which will turn warning messages into errors. We can do this by setting the warn option to a specific level, in this case 2.

 

The default is warn = 0, which means warnings will be stored until the top level function returns. We could use warn = 1 which wil print the warning as it is encountered, but in this case we want to stop straight away, so will set it to 2.

 

options(warn = 2)

 

Let’s try again:

 

> mod <- nnet(X, Y, size=2)
# weights: 19
Error in nnet.default(X, Y, size = 2) : 
  (converted from warning) NAs introduced by coercion
>

 

Hmm, still coming from nnet.default, let’s see if traceback() is offering any new information.

 

> traceback()
6: doWithOneRestart(return(expr), restart)
5: withOneRestart(expr, restarts[[1L]])
4: withRestarts({
       .Internal(.signalCondition(simpleWarning(msg, call), msg, 
           call))
       .Internal(.dfltWarn(msg, call))
   }, muffleWarning = function() NULL)
3: .signalSimpleWarning(“NAs introduced by coercion”, quote(nnet.default(X, 
       Y, size = 2)))
2: nnet.default(X, Y, size = 2)
1: nnet(X, Y, size = 2)

 

We see a whole bunch of extra stuff in the traceback(), but from 3 onwards it appears to be sideffects of having set warn = 2. We do however see our warning has come from nnet.default again, so we will dig into that to see if we can find out what is going on.

 

Getting dirty with debug()

 

To do this, we can use the debug function. We will turn on debugging for nnet, which will let us step through the code line by line as it is executed.

 

> debug(nnet)
> mod <- nnet(X, Y, size=2)
debugging in: nnet(X, Y, size = 2)
debug: UseMethod(“nnet”)
Browse[2]> 

 

The Browse> prompt tells us we are in the debugger. The debug: UseMethod(“nnet”) tells us the next line of code to be executed is UseMethod(“nnet”). We could enter ‘n’ here to continue to the next line, however a convenient default is just hitting enter (i.e. an empty line).

 

Browse[2]> 
debugging in: nnet.default(X, Y, size = 2)
debug: {
    net <- NULL
    class(net) <- “nnet”
    net
}
Browse[3]> 

 

Here R has printed out the R source for the function we just entered. We can see we are still at our Browse> prompt, so will continue on by hitting enter again and again:

 

Browse[3]> 
debug: net <- NULL
Browse[3]> 
debug: x <- as.matrix(x)
Browse[3]> 
debug: y <- as.matrix(y)
Browse[3]> 
debug: if (any(is.na(x))) stop(“missing values in ‘x'”)
Browse[3]> 
debug: NULL
Browse[3]> 

 

Well this is neat. Hitting enter at the prompt, R shows us each line that is about to be executed. We will continue on hitting enter until we see our error message.

 

Browse[3]> 
… 
Browse[3]>
debug: if (length(weights) != ntr || any(weights < 0)) stop(“invalid weights vector”)
Browse[3]> 
debug: NULL
Browse[3]>

 

One thing of interest, we can see a conditional if statement is about to be run. When the conditional is evaluated as false, meaning the conditioning code won’t be executed, we will see the debug: NULL printed out.

 

Browse[3]> 
debug: Z <- as.double(cbind(x, y))
Browse[3]> 
Error in nnet.default(X, Y, size = 2) : 
  (converted from warning) NAs introduced by coercion

 

Well, there is our warning message. Unfortunately we have lost the Browse> prompt, meaning we are no longer inside the function being debugged, but back at the main prompt.

 

This is a side effect of our rather aggressive options(“warn”) setting. Let’s tone it down a bit and set it to 1 so the warnings will be printed as they occur, then jump back into debugging.

 

> options(warn=1)
> mod <- nnet(X, Y, size=2)
debugging in: nnet(X, Y, size = 2)
debug: UseMethod(“nnet”)
Browse[2]> 
debugging in: nnet.default(X, Y, size = 2)
debug: {
    net <- NULL
Browse[3]> 
Browse[3]> 
debug: Z <- as.double(cbind(x, y))
Browse[3]> 
Warning in nnet.default(X, Y, size = 2) : NAs introduced by coercion
debug: storage.mode(weights) <- “double”
Browse[3]> 

 

After some time, we get our warning message, and we are still in our debugged function. The warning is coming from the line

 

Z <- as.double(cbind(x, y))

 

What could be the problem here? Something is going wrong when nnet is converting x and y to doubles. Let’s take a look at them to see if there is anything going on.

 

We can do this by using get(“variable”), where variable is the quoted name of the variable. First let’s take a look at other two variables in the function to see how it works:

 

Browse[3]> get(“nout”)
[1] 1
Browse[3]> get(“ntr”)
[1] 150
Browse[3]> 

 

Looking back at the debug output of each line, we can see they were set as some of the dimension info for X and Y, nout has a value of 1 and ntr has a value of 150.

 

Let’s take a look at x and y now:

 

Browse[3]> get(“x”)
       Sepal.Length Sepal.Width Petal.Length Petal.Width
  [1,] 5.1 3.5 1.4 0.2
  [2,] 4.9 3.0 1.4 0.2
  [3,] 4.7 3.2 1.3 0.2
  [4,] 4.6 3.1 1.5 0.2
Browse[3]>

 

This is our input X data.

 

Browse[3]> get(“y”)
       [,1]        
  [1,] “setosa”    
  [2,] “setosa”    
  [3,] “setosa”    
  [4,] “setosa”    
Browse[3]>

 

And here is our Y data, just as we passed in.

 

Recall the warning was triggered by this line:

 

Z <- as.double(cbind(x, y))

 

Which is converting x and y to the numeric double type.

 

We can see that x is numeric data, while y is character data. What happens if you convert character data to numeric? It doesn’t seem to make sense, but let’s try:

 

Browse[3]> as.numeric(get(“y”))
Warning: NAs introduced by coercion
  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [37] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [73] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[109] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[145] NA NA NA NA NA NA
Browse[3]> 

 

A-ha!

 

This is just the warning we see, and thinking about it, we can understand why converting strings to numeric data is probably not going to be particularly meaningful.

 

Let’s get out of the debugger and think about what might be going on. Hit ‘c’ to continue execution

 

Browse[3]> c
Error in nnet.default(X, Y, size = 2) : 
  NA/NaN/Inf in foreign function call (arg 2)
>

 

We are now back at the main prompt.

 

Now what??

 

What is going on? It seems reasonable that we should be able to pass a factor for classification, in fact we are pretty sure that’s what we saw being used in the examples in the nnet package documentation.

 

A careful reading of help(nnet) reveals some details. In particular:

 

If the response in formula is a factor, an appropriate classification network is constructed … If the response is not a factor, it is passed on unchanged to nnet.default.

 

It is possible to pass a factor in Y, but we must use the formula syntax. Looking at the examples, we see the matrix syntax in use as well, however it is transforming the y values using class.ind().

 

Right about now I contemplate calling the police, as I’m pretty sure someone has snuck in and changed the docs while I wasn’t looking. There is no way I would make such a simple mistake …

 

Anyway, let’s turn off debugging, and see if we can get this working:

 

> undebug(nnet)
> mod <- nnet(X, class.ind(Y), size=2)
> mod <- nnet(Y~., data=cbind(X, Y), size=2)

 

Both methods run without error (or warning). Success at last.

 

Outro

 

We’ve seen a few ways we can dig into R and track down where things are going wrong. First is using options(warn=2) to make the R convert warnings to errors, and using traceback() to find out in which function the issue is arising.

 

Often, this may be enough to get things back on track, especially if the function causing trouble is small. For more complex issues, we can use debug() which will let us step through the function line by line, inspecting variables and internal state as needed.

 

It should be said this will only be of use when the source is available to R, i.e. the function in question has been implemented in R and not compiled C/C++/Fortran or whatever that is imported as a shared lib. Deubgging that is more involved and there is some reasonable documentation on this available here.

 

That’s all for now, happy hunting and thanks for stopping by.

 

Advertisements

Healthcare expenditure, it’s actually more like 100% of the economy

An interesting topic that has come up in my med school classes on health policy and health economics is the increasing expenditures on healthcare.  Just defining what that means is a bit of challenge, as you have to take into account things like inflation, the overall expansion of the economy, and so forth, but many people have settled on percent of GDP as a decent way to measure healthcare expenditure.  A fraction is a nice dimensionless number, which should be comparable across time.  I took the figure below from Wikipedia, looking at US healthcare expenditure over nearly half a century.  It does indeed look like it is increasing staggeringly.

800px-united_states_health_car

However, we have to pause for a money to think about what we mean by healthcare expenditure, and usually we think of it as being paying for things like doctors visits, medicines, time spent in the hospital, and medical devices including everything from pacemakers to wheelchairs.   What is the rest of the economic expenditure then?  Things like food, housing/construction, household goods, entertainment, etc.

 

Why do people pay for these other things?  I would contend that almost all of them are related to healthcare.

 

Historically, individuals put a lot of effort into securing food, maintaining stable access to food.  Why do we pay for food?  Although we can get a lot of enjoyment from food, gong to restaurants with friends, and other reasons, we fundamentally eat to keep ourselves healthy.  The same is true of the construction industry, we construct buildings to protect us from the elements, keep us warm, dry, and safe from harm.  We put in plumbing with clean water and hot showers so we can reduce infection.  We have things like shoes and clothing to protect us from injury to our feet, sunburn, frostbite, ticks, and other things that can hurt us.  As we expand our conception of how our expenditures promote our health, we have police and military to keep us safe from the physical harm of people who might want to hurt us, firemen to keep ourselves safe from fire.  We have an entertainment industry to keep us happy.

 

What has happened over the past millennia is that, as a society, we have gotten very efficient at many of these aspects.  We can produce food, shelter, clothing, clean water, etc. with a much lower expenditure of total human effort.  At the same time, we have dramatically reduced the morbidity and mortality associated with starvation, malnutrition, exposure, infection, and even attacks from wild animals.  These things still happen, and indeed in many areas in the industrialized world have people who experience food insecurity and malnourishment.  However, broadly speaking, in the industrialized world we have massively reduced the illnesses caused by a lack of food and shelter.  We have introduced health problems now due to access to cheap cigarettes and high calorie cheap food, but we are working as a society to reduce these threats to health as well.

 

This then shifts the health concerns to a moreexotic and diverse range of illnesses which demand greater and greater technical skill in addressing.   Cancer is not a substantial healthcare burden on a society, until they have addressed access to food, shelter, and sanitation.  We can view the increasing expenditure on that part of the economy we call healthcare as a great success.  We have been able to become very efficient at the other areas of production and we are now as a society lucky enough to have substantial numbers of people to live long enough to suffer illnesses like Alzheimer’s and Parkinson’s Disease.  Those are new challenges which we can work to overcome to improve the health of our society.  A society with a larger and larger fraction of its economy dedicated to healthcare means that it is becoming very efficient at producing the other aspects it needs and can dedicate itself to improving the lives of its sickest members.

 

 

The Ramen Shop, new foodie ramen in Oakland

Local backups on Mac laptops are a drag on resources…

For those of you with a Mac Air or with a laptop with limited drive space, you might find this tip helpful. By default in Lion, Time Machine keeps a local backup of changes in between syncs, so you can end up with this huge file in /.MobileBackups. The point of this is for protection and document versioning on the go. However, this can be a big CPU (presumably battery life) and disk hog. Anyway, you can turn off this lovely “feature” by opening a terminal window and doing: sudo tmutil disablelocal

Ars Technica talks about this and other aspects of Lion:

http://arstechnica.com/apple/2011/07/mac-os-x-10-7/18/

The fix:

http://dear-apple.com/mobile-backups-are-taking-too-much-space-turn

Delicious Snack

If you mix Chex Mix and little chocolate chip cookies, you get the most absurdly, addictive and delicious snack.

 

It is about the least healthy thing in the universe, but the cookies provide a rich, silky sweetness, which combines with the salty-umami of the Chex mix in a wonderful storm of deliciousness with crunchy, complex yet delicate mouthfeel.

Photo