ramen and pickles

science, technology, and medicine served up with some tasty noodles

Monthly Archives: July 2013

Circulating your “To be submitted” manuscript

Circulate your manuscript prior to submission to everyone you work with to get feedback and make sure everyone is properly acknowledged in a way that makes everyone happy (or at least get multiple opinions to ensure fairness).  The Neuroskeptic has a nice blog post about circulating your manuscript to your labmates/colleagues before submission.   I really wish this practice was more widely adopted.

Many people are in large research groups or work with many collaborators.  It can be hard to keep track of what everyone is doing, which means missed opportunities for collaboration or mutual assistance.   In research, there are also many opportunities for people to contribute ideas and knowledge in ways that sometimes don’t get acknowledged in publications. For many in academia, their currency is their ideas and knowledge.

By circulating your paper to everyone you work with, it’s a good opportunity to make sure everyone who is credited as a co-author or in the acknowledgements is happy with how they are included and that no one who thinks they should be included has been left out.   Often people will not be happy with the final result, but at least by offering an opportunity for discussion on this issue.  The main author (whether first or last in the author list) can be the main arbiter, but it is polite and generally a good idea to let everyone who thinks they contributed to the paper to have an opportunity to discuss with the main author about their contribution and how they are included/acknowledged.   It sets a good foundation for future collaboration.   For example, it’s often easy to miss including the contribution of a summer student or technician who worked on a project briefly, but often that corresponds to a substantial contribution, and including them as a middle co-author can provide a substantial boost to their long term education/career goals.  I also remember hearing a talk by Bill Hersh, an expert on research literature text mining, where mentioned that the more authors a paper had the more citations it got.  Now this might be because of large consortium/collaboration papers being more impactful, but if I remember correctly this was a residual effect, having more to do with the fact that with more authors, more people were finding the paper or publicizing it.

Their are even technical solutions.  Recently, I was working on a project that ended up with a patent submission.   Since everyone was working on separate things and to very different amounts, it was hard to allocate patent royalties (each person gets x%), so the project lead set up a survey where each person could give their idea of a fair allocation, and then those results were averaged.  One could imagine a prisoner’s dilemma type of situation, but in practice we are all going to continue working together, so everyone had some incentive to try to be fair, so we ended up with a pretty fair allocation of credit in this crowd sourced way.

By having everyone involved have an opportunity to comment on how credit in the paper (author list and acknowledgements) is allocated, then I think you’re more likely to get a fair and equitable outcome.

Just because science is sometimes very competitive, it doesn’t need to be uncivilized.

Advertisements

Latin to sound fancy

I took some Latin in high school and even won a prize for some translation, so I appreciate it when someone validates that time spent by highlighting the importance of knowing a little Latin, so I liked this blog post:  Latin Words and Phrases Every Man Should Know.  Unfortunately, there were a lot of things on there I didn’t know.   So I will have to group myself with Shakespeare whom was eulogized by Ben Jonson:  “thou hadst small Latin and less Greek,” so that’s not bad company to be in.  Peppering everyday conversation with these phrases seems like a surefire way of making people think you are a pompous ass (nolo contendere).

Knowing some Latin (and a little Greek) does occasionally come in handy in medicine, but not very much really, and sometimes it’s not that informative.  For example, one of the anatomical terms used frequently is foramen ovale, which means “oval hole.”  It can refer to a hole in the septum of the heart or a hole in the skull that some important nerves and blood vessels travel through.  The point is that “oval hole” isn’t really that special of a term; it just sounds a lot nicer in Latin than in English I guess and because it’s in Latin we can know that it refers to that particular anatomical opening and not any old random oval hole I suppose.

In another post, I should do equal coverage and make fun of uses of Greek, particularly idiopathic and iatrogenic.   They are nice uses of fancy sounding words to give an air of competency, when usually quite the opposite is the case.

 

R spells for data wizards

I found this great post from Thomas Levine:

http://thomaslevine.com/!/r-spells-for-data-wizards/

In it, he goes through some helpful tips and tricks for a bunch of common situations in R where things are a bit nonintuitive to a new user.   It’s really targeted for new-intermediate users.  When you’re ready to be more hardcore, you can move on to the R Inferno.  I am going to blatantly copy/mirror Levine’s tips.

——————————————————————

CSV

When loading a CSV, don’t convert strings to factors.

read.csv('csvsoundsystem.com/soundsystem.csv', stringsAsFactors = F)

When writing a CSV, don’t add the rownames.

write.csv(iris, file = 'iris.csv', row.names = F)

Indexing

It’s easy to miss a level of indexing, especially with lists.

str(list(a = 3)[1][[1]])
# num 3

str(list(a = 3)[1])
# List of 1
# $ a: num 3

str(list(a = 3))
# List of 1
# $ a: num 3

You can use character vectors indices.

row.names(HairEyeColor)
# [1] "Black" "Brown" "Red"   "Blond"

row.names(HairEyeColor) <- c('Pink', 'Blue', 'Green', 'Clear')
HairEyeColor['Pink',,]
#        Sex
# Eye     Male Female
#   Brown   32     36
#   Blue    11      9
#   Hazel   10      5
#   Green    3      2

HairEyeColor[,,'Male']
#        Eye
# Hair    Brown Blue Hazel Green
#   Pink     32   11    10     3
#   Blue     53   50    25    15
#   Green    10   10     7     7
#   Clear     3   30     5     8

Factors

Factor levels are sorted alphabetically by default

levels(factor(10:1))
# [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

If you want to change that, just create a new factor, specifying the level order manually.

factor(parking$GarOrLot, levels = c('G', 'L'))

And you rename a level or levels like so.

levels(OrchardSprays<reatment)[3:5] <- c('X', 'Y', 'Z')

Concatenating text

This is how you concatenate text.

paste('abc', 'def', sep = '')

In JavaScript, this would be 'abc' + 'def'. Sort of. R’s paste is more powerful because supports vectors! If you pass it vectors, paste will ordinarily concatenate corresponding elements across vector.

paste(c('a','b','c'), 1:3)

If you want to concatenate the elements within a vector, use collapse

paste(c('Pack', 'my', 'box', 'with', 'five', 'dozen', 'liquor', 'jugs.'), collapse = ' ')

In case that isn’t clear, it would look like this in JavaScript:

['Pack', 'my', 'box', 'with', 'five', 'dozen', 'liquor', 'jugs.'].join(' ')

Plotting

Show all factor levels in a ggplot.

ggplot(iris[1:50,]) + aes(x = Species, y = Sepal.Length) +
  scale_x_discrete('Species', drop = F) + geom_jitter()

Also, in general, use ggplot. Base R graphics are more work than they’re worth, except maybe if you’re making music videos.

That said, if you do use base R graphics, try using locator when you’re perfecting the layout of base R graphics.

Maintenance

Update your packages.

update.packages()

.Rprofile

Set your preferred mirror

r <- getOption("repos")
r["CRAN"] <- "http://cran.mirrors.hoobly.com"
options(repos = r)
rm(r)

Remove the carrots from the beginnings of the lines so that you can run code that you’ve copied from the shell.

options(prompt="  ")
options(continue="+ ") 

Make the screen wide

options(width=160)

Save your command history and output

Sys.setenv(R_HISTSIZE='100000')
sink(file = paste('~/.history/r-log-', strftime(Sys.time(), '%F %H:%M:%OS9'), '-', sep = ''), split=T)

Higher-order functions

R’s “apply” functions would be called “maps” in other languages. If you’re applying along a list or vector, lapply or sapply, respectively, are convenient.

apply maps along any dimension of an array; you specify the dimension as an argument.

mapply maps along a matrix, passing multiple arguments to the function

rollapply is really cool. It applies a function with a rolling window. For example, here’s a rolling z-score that Brian wrote.

library(zoo)

roll_z <- function(x){
    scores <- z(x)
    scores[length(x)]
}

z_change <- rollapply(rnorm(1000), 40, roll_z)

Other stuff

Use ProjectTemplate.

library(ProjectTemplate)

Use str to find out something’s type.

str(ChickWeight)

sqldf works both on R data.frames and on other databases

sqldf('SELECT foo FROM bar') # Use the bar data.frame
sqldf('SELECT foo FROM bar', dbname = 'baz.db') # Use the baz.db SQLite database

Use download.file to download files.

Sort one thing by another thing.

iris[order(iris$Sepal.Length),]
cars$speed[order(cars$dist)]

 

EMR Infographics

What is an EMR?  An Electronic Medical Record, although the acronym is also often used to refer to an electronic medical record system, a computer program or whole IT infrastructure for keeping track of medical records.

Still confused about what an EMR is?  Check out this infographic from HeathIT.gov:

http://www.healthit.gov/patients-families/electronic-health-records-infographic Or this one from Dell:

http://en.community.dell.com/dell-blogs/direct2dell/b/direct2dell/archive/2011/10/25/the-future-of-electronic-medical-records-infographic.aspx

Want to know more about how EMR systems are used across the country, look at this nice infographic from CRT Medical:

http://www.crtmedical.com/emr-by-the-numbers-infographic/

Interested in learning more about particular EMR systems, this infographic from Capterra shows many different ones:

http://www.capterra.com/infographic-top-20-emr-software-solutionsThis popularity index seem to be based on number of installations or overall sales, which is not the same thing as total users.  A large hospital system may have many employees which use the system, while a small clinic may only have a few.

Now that wasn’t so painful, was it?