A layman’s guide to working with CIFTI files


[Update: Here is another nice intro to CIFTI files, by Jo Etzel at WashU.]

My research group recently began working with the Human Connectome Project (HCP) dataset, a large database of resting-state fMRI, task fMRI and other brain imaging and demographic data for 500+ subjects.  The HCP is pretty unique in that it combines a large number of subjects with long scans and a standardized scanning protocol.  This distinguishes it from other large neuroimaging datasets, such as ABIDE, ADHD-200 or the 1000 Functional Connectomes Project, which are grassroots compilations of data collected across multiple sites with different scanning equipment, acquisition protocols, preprocessing pipelines and quality control procedures.  Most of the resting-state scans in those datasets are 5-10 minutes long, whereas in the HCP there are 60 minutes of resting-state data collected for each subject.  

Our group and many others are interested in using the HCP to develop and evaluate different ways of analyzing brain imaging data.  However, some aspects of the HCP are new to many researchers, and it’s not always obvious from an outside perspective how to work with and interpret the data.  One example of this is data stored in “grayordinates” using the CIFTI file format.  While the HCP also contains data in volumetric space (in NIFTI file format), working with the grayordinates is appealing for a number of reasons that I won’t elaborate on here, except to say that the idea is to isolate the gray matter, which consists of cortical, subcortical and cerebellar areas.  However, working with CIFTI files can be daunting for a new user, and the documentation is not always very user-friendly.

I recently started working with these files, so based on my experience here is a basic introduction to navigating, reading and writing CIFTI files with MATLAB.  This explanation is based on my limited experience, but it will hopefully be helpful for other novice users.  Please feel free to make additions or corrections in the comments!

Part 1: Navigating CIFTI files

The first thing to know about CIFTI files is that they contain several parts, similar to a structure in MATLAB or a list in R.  In that sense, they are very different from NIFTI files, which are essentially just 3D or 4D arrays of intensities.  Furthermore, there are several types of CIFTI files, including times series (*.dtseries.nii), parcellations (*.dlabel.nii), and scalar images (*.dscalar.nii), and the parts or “fields” contained in each type of CIFTI file are different (though they have several commonalities). 

Consider a CIFTI file representing time series data.  The time series themselves are stored in a VxT matrix, where V is the number of grayordinate “voxels” and T is the number of time points in the series.  (Technically “voxels” only refer to volumetric space, but it’s convenient to have a single term for both surface and volumetric elements… please forgive the abuse of notation.)  The total number of voxels in a CIFTI file is around 90,000, including about 60,000 surface voxels (about 30,000 per hemisphere) and 30,000 subcortical and cerebellar voxels.  The surface voxels are a 2D representation of the cortical gray matter, while the subcortical voxels are still in 3D/volumetric space.  Each surface grayordinate is essentially an average of several cortical gray matter voxels in the original volumetric scan.  So what else does a CIFTI file contain?  Common to all types of CIFTI files are the following fields:

  • brainstructure: a vector of length V with a numerical indicator (1-21) for the major brain structure that each voxel forms part of
  • brainstructurelabel: a vector of length 21 with the name of each major brain structure (e.g. CORTEX_LEFT, CORTEX_RIGHT, CAUDATE_LEFT, CAUDATE_RIGHT, CEREBELLUM_LEFT, CEREBELLUM_RIGHT, etc.)
  • pos: a Vx3 matrix of the x-y-z coordinates of each voxel.  However, only the subcortical and cerebellar voxels have coordinates specified; for surface voxels, the values are missing.

In addition, each type of CIFTI file contains additional information.

  • A time series (*.dtseries.nii) CIFTI file contains time (1xT) and dtseries (VxT), where time is the timing (in seconds) of each time point (0, 0.72, 1.44, …) and dtseries is the time series of each voxel.
  • A parcellation (*.dlabel.nii) CIFTI file contains some parcellation field myfield (Vx1) and myfieldlabel (1xQ), where Q is the number of parcels.  For example, in the HCP groupICA parcellation files (melodic_IC_ftb.dlabel.nii), the parcellation field is named “indexmax“, and Q is the number of ICA components.  (As a side note, I believe “indexmax” refers to how spatial components from a groupICA were used to create a parcellation, namely by identifying the component with the maximum z-score at each voxel.)  In the FreeSurfer anatomical parcellation files ending in aparc.32k_fs_LR.dlabel.nii, theparcellation field is named “x100307_aparc”, and Q=70.
  • A scalar (*.dscalar.nii) CIFTI file contains one or more fields field1 (Vx1), field2 (Vx1), etc., each representing a scalar image of some sort.  For example, in the HCP groupICA spatial maps are represented as this type of file, with the fields x1,….,xQ representing the Q spatial maps.  Structural images for each subject are also represented as .dscalar files.  For example, the files ending in MyelinMap_BC.32k_fs_LR.dscalar.nii contain the field myelinmap_bc, which contain the estimated myelin content of each of the roughly 60,000 surface voxels.

You might have noticed that I never mentioned where CIFTI files store the location of each surface voxel.  That’s because they don’t!  This is because the surface can be “inflated” to varying degrees, from very “wiggly” (showing full anatomical detail) to completely flattened (showing no anatomical detail).  Depending on the degree of inflation, the 3-dimensional location of each surface voxel changes!  For a given degree of inflation, the x-y-z location of each voxel are stored as a different type of file, called a GIFTI.  I’ll mention below how to read and navigate these files in MATLAB too.

Part 2: Reading CIFTI files with MATLAB

I use the FieldTrip toolbox, which includes the function ft_read_cifti.  Once you have the toolbox downloaded and have added to the path, reading a CIFTI file is as simple as mycifti = ft_read_cifti(‘fname.dsomething.nii’).  mycifti is then a MATLAB structure containing fields such as the ones described above.  Each field can be accessed or altered with the “.” operator, i.e. mycifti.fieldname.  

As mentioned above, CIFTI files do not contain the x-y-z coordinates of the surface voxels.  These are instead contained in GIFTI files with names ending in “surf.gii”.  In the HCP dataset, the level of inflation is indicated in the file name and includes “flat”, “inflated”, “mid thickness”, and “very_inflated”.  The left- and right-hemispheres are stored in separate GIFTI files.  The GIFTI toolbox can be used to read GIFTI files into MATLAB.  After installing and adding the toolbox, mygifti = gifti(‘fname.surf.gii’) results in a MATLAB structure with several fields, including vertices, a Vx 3 matrix containing the x-y-z coordinates of each voxel, where VL is around 30,000, the number of surface voxels in one hemisphere of the brain.  To map the voxels in a GIFTI file to the corresponding voxels in a CIFTI file, reference the brainstructure field: all voxels with brainstructure==1 (CORTEX_LEFT) map to those in a GIFTI file for the left hemisphere; all voxels with brainstructure==2 (CORTEX_RIGHT) map to those in a GIFTI file for the right hemisphere.

Part 3: Writing CIFTI files with MATLAB

I have found this to be the most challenging aspect of working with CIFTI files, in part because the documentation for the ft_write_cifti function seems to be somewhat limited.  Furthermore, since there are different types of CIFTI files, you cannot simply run ft_write_cifti(‘fname’, mycifti), even if mycifti is simply the result of mycifti = ft_read_cifti(‘fname.d*.nii’).  Instead, you must provide some additional information in the form of “optional” key-value pairs.  Also, in general I’ve found that you must respect the original file type.  For example, to save a parcellation you should start by reading in a .dlabel file; to save a static image you should start by reading in a .dscalar.nii file; etc.  Here are a few examples.

  • ft_write_cifti(‘fname’, mycifti, ‘parameter’, ‘dtseries’) saves the object mycifti, which has a dtseries field containing the functional data, as the CIFTI file fname.dtseries.nii.
  • ft_write_cifti(‘fname’, mycifti, ‘parameter’, ‘x1’, ‘parameter’, ‘x2’) saves the object mycifti, which has fields x1 and x2 containing two static images, as the CIFTI file fname.dscalar.nii.
  • ft_write_cifti(‘fname’, ‘mycifti’, ‘parcellation’, ‘parcels’, ‘parameter’, ‘parcels’) or ft_write_cifti(‘fname’, ‘mycifti’, ‘parameter’, ‘parcels’) saves the object mycifti, which has a field parcels containing the labels of each voxel, as the CIFTI file fname.dscalar.nii.  (I haven’t figured out how to save a .dlabel file.  Running ft_write_cifti(‘fname’, ‘mycifti’, ‘parcellation’, ‘parcels’) throws an error that indicates that ‘parameter’ is required.  Specifying ‘parameter’ and ‘parcellation’ or just ‘parameter’ results in a .dscalar file.  If anyone knows how to do this, I’d love to know!)

One aspect of working with CIFTI files that I have conspiciously ignored is visualization.  This is obviously a very important aspect of working with imaging data, but for now let’s just say it’s next on my to-do list.  Luckily another member of our group, Mary Beth Nebel, has started working with the Connectome Workbench, and has produced some beautiful figures, including the ones below.  I hope to pick her brain soon and report back!

Scan-rescan reliability by ICA parcel (Q=100):


Group ICA parcellations:


Derivation and interpretation of leverage in SLR


I was recently looking online for an explicit derivation of leverage — the diagonals of the “hat matrix” — in simple linear regression. If X is the n-times-p matrix of explanatory variables in a linear model, then the hat matrix is H=X(X’X)-1X’, so called because it puts the “hat” on the predicted values, since Ŷ = HY.

Leverage (hi) has a lot of nice properties that can be quite informative for model diagnostics. For example, if Yi were to change by 1 unit, then Ŷi will change by hi. Leverage is also proportional to the uncertainty in the predicted values Ŷi, since Var(Ŷ)=σ2H, where σ2 is the variance of the model residuals.

In the simple linear regression (SLR) case, if your model contains an intercept then leverage is of the form

Screen Shot 2015-07-23 at 3.21.41 PM.

Therefore, leverage is similar to Mahalanobis distance, which measures distance from the “center” of the data, standardized by the “scale” or variance. The same is true for multivariate linear regression with an intercept. However, the relationship isn’t entirely obvious (at least to me) from the hat matrix formula, and I couldn’t find the proof anywhere online (I’m sure it’s in someone’s linear models class notes but I didn’t find it from my digging around). So I wrote it up and posted it to my Resources page.

One interesting thing to note: if your model does not contain an intercept, this relationship doesn’t hold.  However, if you center the X’s, the relationship above again holds, but without the 1/n term at the beginning.

Tips for submitting to arXiv for the first time


Today I successfully submitted my first paper to arXiv!  We’ve submitted this paper to a journal, but it hasn’t been published yet, so we wanted to get a pre-print up before advertising the corresponding software packages.  Unfortunately, the process of submitting to arXiv wasn’t painless.  Now that I’ve figured out some of the quirks, however, hopefully your first submission can go a little more smoothly than mine did.

After registering, the first unpleasant surprise was that arXiv doesn’t accept PDF documents created from LaTeX code.  You must submit your LaTeX code and all supporting files.  As far as I can tell, these have to be uploaded one-by-one.  Sorry, I didn’t figure out a way around that one (but let me know if you do!).

After verifying that my code compiled without errors on my computer, I uploaded all my documents, hit “Process”, and … long page of errors!  It turns out arXiv doesn’t accept .bib files and instead requires .bbl files for BibTeX.  So I thought, “this will be easy” and uploaded my .bbl file instead.  I again hit “Process”, and … another long page of errors!

I wasn’t sure how to interpret these errors, so … I gave up for a while.  A few weeks later, I decided to give it another go.  I realized that I had made a couple of mistakes the first time:

  1. Several of my graphics were in PDF format.  This is fine, but I needed to include this line at the top of my preamble (top of my .tex document) to ensure that the .tex file is built using PDFLaTeX instead of LaTeX:
  2. I had probably uploaded some unnecessary files, such as graphics that weren’t included in the final version.  ArXiv doesn’t like this.  Make sure only necessary files are uploaded.

Fixing those two things resulted finally in an error-free compilation.  Horray!  (If at this point you are still struggling with compilation errors, I highly recommend reading arXiv’s help page on Considerations for TeX Submissions.)

One final annoyance was the abstract.  I had included an abstract in my document, but arXiv has its own abstract form, and my current abstract exceeded the character limits.  Once I had pared it down, however, I got a couple of errors related to “unknown characters”.  I didn’t have anything crazy in the text, so this seemed a little ridiculous, but I went ahead and removed a ‘%’ sign and a couple of quotation marks and hyphens, which took care of the errors. I then went back and removed the abstract from the .tex document and re-uploaded it.  Submission (finally) complete!

Props to arXiv for a very nice set of help pages, the most relevant of which for this post is Submitting an Article.  This includes links to the TeX Submissions help page referenced above, among many others.

P.S. If you’re curious about the paper, it’s titled “Improving Reliability of Subject-Level Resting-State fMRI Parcellation with Shrinkage Estimators” and is available here.

Three ways to use MATLAB from R


Being a statistician working in neuroimaging is a little like living abroad and trying to speak a foreign language. For example, my first language is English, but I spent my first summer as a PhD student doing research at LMU in Munich, Germany. I had taken German in college and could have really basic conversations, but for more demanding situations I would switch to English in a heartbeat, given the option.

In the world of neuroimaging, or at least many corners of it, MATLAB is the official language. Many tools and programs for the analysis of fMRI data are programmed in MATLAB, and many of my collaborators program exclusively in MATLAB. I know MATLAB well enough to do basic things like reading in data, looping and playing with matrices, but beyond the basics, like many statisticians I am much more comfortable in R. Therefore, I often find myself straddling the line between R and MATLAB.

For example, in one of my projects, I used R to read in the fMRI time series, create summary statistics, and do some transformations on those statistics. I then used MATLAB to perform clustering on the results using a method developed by one of my collaborators. Finally, I again used R to visualize the results using brainR.  For that project, it was convenient enough to write separate programs for the different R and MATLAB parts. I saved the results of the first R part in a CSV file using write.table, read that file into MATLAB using csvread, saved the output of the MATLAB part in another CSV file, and finally read that into R using read.table. Not exactly elegant, but it did the job.

However, I also created and ran a simulation that required the same basic process but over many iterations, and the manual workflow I just described was not a realistic option. I needed a way to run MATLAB automatically from within my R program. (In my case, I eventually compromised and did the clustering in R for the purposes of the simulation, a la option 4 below.) I experimented with different ways to use MATLAB from within R, and I found three primary options. Below I’ll show how to do each of these and discuss their pros and cons, but in short the options are (1) to execute a single MATLAB line using the system() command in R; (2) to use a package like R.matlab to send code to the MATLAB server; or (3) to write out a MATLAB program within R using the writeLines() function and run the entire program using the system() command in R.

Option 1: Run a single MATLAB command at a time using system()

The system command in R runs the contents of a character string as a shell command – basically anything you’d type into a Terminal window on a Mac or in a Linux environment, such as the Hopkins cluster. For example, typing system("ls") will return a list of the files in the working directory. If your system has the ability to run MATLAB from the command line (true for the Hopkins cluster), you can take advantage of that through R by typing system("matlab -nodisplay -r 'stuff; to; do; in; matlab;'").

Tip: If you’re working locally on a Mac with MATLAB installed, you’d need to go to the directory where MATLAB is installed: for example, this works on my computer to open MATLAB on the command line: /Applications/MATLAB_R2013a.app/bin/matlab -nodisplay


  • Quick and easy
  • Possible to run multiple lines, separated by semicolons


  • All commands must be contained within a single line
  • Loops and line-to-line dependencies are somewhat tricky
  • Can’t pass data between MATLAB and R
  • If there is an error in your code, the execution will stop and MATLAB will remain open until you re-type exit.
system('matlab -nodisplay -r "a=2; b=1; display(a+b); exit"')
                                 < M A T L A B (R) >
                       Copyright 1984-2013 The MathWorks, Inc.
                         R2013b ( 64-bit (glnxa64)
                                   August 13, 2013

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

ans =


To see an example of the last con – “If there is an error in your code, the execution will stop and MATLAB will remain open until you re-type exit” – suppose I had mis-spelled display as displya:

system('matlab -nodisplay -r "a=2; b=1; displya(a+b); exit;"')
                                 < M A T L A B (R) >
                       Copyright 1984-2013 The MathWorks, Inc.
                         R2013b ( 64-bit (glnxa64)
                                   August 13, 2013

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

Undefined function 'displya' for input arguments of type 'double'.


The double bracket (>>) indicates that we’re still in the MATLAB environment. Simply type exit to close the MATLAB connection and return to R.

Option 2: Use R.matlab to send code to the MATLAB server

R.matlab is a package that communicates with MATLAB, can read and write MAT files, and can pass objects to (and receive objects from) MATLAB. It’s a little tricky to use at first, but the ability to pass objects between R and MATLAB can be very useful. However, this ability is limited to objects of a reasonable size, and can be slow.


  • Can spread commands over multiple lines
  • Loops and dependencies are easier
  • Can pass data back and forth using setVariable() and getVariable()


  • Can be tricky to use
  • Connection to the MATLAB server can be buggy
  • Clunky output
  • Can’t run multiple jobs in parallel jobs on the same node, since the MATLAB server can only be opened once on each node, making it limited for simulations
  • Passing large objects back and forth doesn’t always work
# load the R.matlab library and start MATLAB server
matlab <- Matlab()
isOpen <- open(matlab)
                            < M A T L A B (R) >
                  Copyright 1984-2013 The MathWorks, Inc.
                    R2013b ( 64-bit (glnxa64)
                              August 13, 2013

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

Running MatlabServer v2.2.0
MATLAB v7.x or higher detected.
Saving with option -V6.
Added InputStreamByteWrapper to dynamic Java CLASSPATH.
MATLAB server started!
Trying to open server socket (port 9999)...done.
Connected to client.
if (!isOpen) throw("MATLAB server is not running: waited 30 seconds.")

# set a variable in R and send to MATLB
x <- 10
setVariable(matlab, x = x)
Received cmd: 3
Will read MAT file: "/tmp/RtmpK3nM1k/file1821f2a4d0339.mat"
evaluate(matlab, "x")
Received cmd: 1
"eval" string: "x"

x =


Sent byte: 0
# set a couple of variables in MATLAB
evaluate(matlab, "y=20; z=x+y")
Received cmd: 1
"eval" string: "y=20; z=x+y"

z =


Sent byte: 0
# pass a variable from MATLAB to R
z <- getVariable(matlab, "z")
Received cmd: 1
"eval" string: "variables = {'z'};"
Sent byte: 0
Received cmd: 2
save(tmpname, '-V6', 'z')
[1,]   30

[1] "MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Fri Aug 15 09:57:56 2014                                        "

[1] "5"

[1] "little"

# close the MATLAB server
Received cmd: -1
MATLAB server shutdown!
> >>

Option 3: Write an entire MATLAB program using writeLines() and run using system()

The writeLines function can be used to create a MATLAB script (a .m file) from a character vector in R. Each element of the vector will become a separate line in the script. This approach can be useful when you want the MATLAB script to vary based on something in the R environment, such as a subject ID or parameters in a simulation. If, on the other hand, the MATLAB script never changes, you can just create the script in a text editor, rather than using writeLines. In either case, the script can be run from within R using the system command.


  • Can run multiple lines with loops and dependencies
  • Easier to use than R.matlab
  • No problems running multiple jobs in parallel on the same node


  • Passing data directly between MATLAB and R isn’t possible, so must write data to CSV or similar
  • As in option 1, if there is an error in the MATLAB code, the execution will stop and MATLAB will remain option until you type exit.
#set a variable in R and save in a csv file
x <- 10
write.table(x, file='~/x.csv', sep=",", row.names=FALSE, col.names=FALSE)

#make a vector where each element is a line of MATLAB code
#matlab code reads in our variable x, creates two variables y and z, 
#and write z in a csv file
matlab.lines <- c(
    "x = csvread('~/x.csv')",
    "csvwrite('~/z.csv', z)")

#create a MATLAB script containing all the commands in matlab.lines
writeLines(matlab.lines, con="~/myscript.m")

#run our MATLAB script
system("matlab -nodisplay -r \"run('~/myscript.m'); exit\"")
# read in the variable z we created in MATLAB
z <- read.table("~/z.csv")
#remove the temporary files we used to pass data between R and MATLAB
system("rm ~/x.csv ~/z.csv")

Option 4: Stick with one or the other

Of course, another solution is to stick with either R or MATLAB, if it’s feasible. Opening the MATLAB server can be slow and clunky, making it problematic in a simulation setting, and passing large amounts of data back and forth can be difficult or impossible. In my simulation, I actually ended up avoiding MATLAB entirely for the sake of speed.

However, it takes time to re-write existing code in a different language, especially one you aren’t super comfortable with (Google Translate for code, anyone?). I’d love to one day be as comfortable with MATLAB as in R, as I hope to one day speak German as well as I speak Spanish (and to speak Spanish as well as I do English). Of course the only way to do any of these things are to practice coding/speaking, but full immersion is time consuming and not always practical, and the solutions above can be helpful in the meantime.

Finally, thank you to John Muschelli for explaining option 3. I also heard a rumor that he might be writing a follow-up post with some additional tips…

If you have another way of interfacing between R and MATLAB, I’d love to hear about it!

Why I’m a Screen evangelist (and how to get started)


I’ve recently made several upgrades to my workflow that make me say “WHY OH WHY did it take me this long to start doing this?”  One of the big upgrades was using Sublime + Enhanced-R as explained by Alyssa Frazee to send code line-by-line to the Terminal with a simple keystroke.  Copy and paste no more!  It makes my fingers hurt just thinking about my old ways.  Thing is, I picked up on the Sublime workflow months after Alyssa and others in my department had started using it.  I read Alyssa’s blog post back in August, but it took me a while to take action.

Not this time!  Following a recommendation from Taki Shinohara, I recently started using GNU screen, or “screen” for short, and I am in nerd heaven.  I’m so excited about this that I am constantly asking everyone if they are “using screen yet?”.  I am sure I am getting on some nerves.  But seriously, it’s the awesomest.  Here’s why.

I often work on my department’s cluster interactively, whether for debugging code that I will submit as a batch job later, or just running code that doesn’t really need to be submitted as a batch job.  I love this way of working:  I get the best of the cluster and my computer at the same time. I can use tons of memory (100G in some cases) and my computer doesn’t get bogged down.  At the same time, I can debug as I go, easily track my code’s progress, and write code as I go.  However, there is one major issue that, until recently, stopped me from working this way most of the time: the connection to the cluster could be lost, taking all my code’s progress down with it.  This can happen for a myriad of reasons: a faulty internet connection, I close my computer, I need to go home for the day, the connection simply times out after a while, and so on.  To get over these limitations, I would often use interactive sessions for the purpose of debugging and code building, but submit my code as a batch job when I needed it to run without interruption.  Alternatively, I would often work interactively, but save my progress in data files periodically and reload the saved files if the connection was lost for any reason.  This workflow does the job, but it’s not convenient or failproof.

Enter screen.  Screen allows me to work interactively for an indefinite length of time without ever having to worry about losing the connection.  A quick rundown:

1. I log onto the cluster and start a new screen by typing ‘screen’.

2. I get on a cluster node (‘qrsh’), start R or MATLAB, and submit whatever code I want to run.

3. When I am ready to stop for the time being, I type ctrl-A ctrl-D to “detach” the screen.

I can then close my computer and move on with my life, and my session will still be there the next time I need it.

4. When I’m ready to look at it again, I log onto the cluster and type ‘screen -r’ to “reattach” the screen.  And it’s back, just like magic.  This works even if code is running when I want to stop working.

Screen also works if the connection is inadvertently lost.  If you close your computer or lose your internet connection before detaching the screen, you simply need to “force detach” before reattaching it.  To do this, log onto the cluster, type ‘screen -D’ to force detach and then ‘screen -r’ to reattach.

You can also have more than one screen session open at once, running completely independently.  In this case, each screen session has an ID, and to reattach screen with ID 12345, you’d type ‘screen -r 12345’.  If you don’t know the screen ID, simply type ‘screen -r’ and screen will return a list of each session and their IDs.

Keep in mind that it’s good practice to exit screen (type ‘ctrl-A ctrl-K’ to kill an attached session) when you’re done working to make sure you aren’t taking up resources that might be needed by other users.

That’s it.  Simple and super convenient.  As long as screen is installed on your cluster, you don’t have to do anything additionally to start using it and stop losing work!

What is quantitative MRI, and why does it matter?


This post is going to be a gentle introduction to MRI, which is a type of brain imaging and my current focus of research.  I’m planning a future post or two about my research, so I thought I’d provide a little background for people who don’t know much, if anything, about MRI.  I am by training a statistician by way of industrial engineering, so I am by no means a neurologist, radiologist, physicist or other type of “-ist”, meaning this is going to be pretty basic.  This particular post will be about something called relaxation times and quantitative MRI scans.

I got interested in neuroimaging about a year ago, thanks to Elizabeth Sweeney, who mentored me through the process of getting up to speed.  It’s a fascinating area for biostatistics, as it combines big data, nice visualizations, and interesting and impactful biological questions.  As imaging technologies like MRI get more sophisticated, scientists are really just starting to understand the details about how the brain works.  And statisticians are just starting to get involved in this learning process, so there are a lot of opportunities to look at existing questions and problems from a new, more statistical perspective.

For about the past year, I have been working on a project having to do with quantitative MRI with Elizabeth, Taki Shinohara, and people at the NINDS.  I’ll talk more about that project in a future post, but for now here’s a basic introduction to the topic.

Imagine yourself lying flat in an MRI scanner.   As the technicians turn the MRI machine on, it creates a strong magnetic field of 3-7 Tesla.  Under that magnetic field, the protons in your brain, which were previously pointing in every direction, will all align in one direction, parallel to the magnetic field.  Then, the technicians will turn the magnetic field off, and your brain’s protons will go back to pointing their original directions.  MRI scanners can measure the “T1 relaxation times”, or the time it takes for your protons to “relax” back to their original positions.  At least, they can try to.  More on that in a bit.

But first, why measure the T1 relaxation time?  The interest in relaxation times began in 1971, when it was discovered that tumors have higher relaxation times than regular tissue.  Since then, studies have established that people with certain disorders, including multiple sclerosis (MS), tend to have widespread increases in T1 relaxation times, indicating neurological degradation, including loss of axons and myelin in the brain.  And while some affected tissue, like tumors and lesions, are easy to see on a regular, unitless MRI scan, other tissue with elevated T1 doesn’t look visibly different on MRI than healthy tissue.  That’s where quantitative images that measure T1 (“T1 maps”) come in.  Though affected tissue may not look different to the naked eye on the scan, the measured T1 tells the story.  If we can measure T1 in normal-appearing gray and white matter, we can better understand the widespread neuro-degenerative impacts of diseases like MS.  Through longitudinal studies of T1, we can better understand long-term disease burden and hopefully design better treatments and relapse detection methods.

So if T1 maps are so great, what’s the problem?  There are several major issues with T1 maps, compared with the more common clinical MRI scans, such as T1-weighted, which offer a contrast image but lack interpretable units.  Here’s an example of a T1 map (A) and a T1-weighted image (B).

  • As is clearly seen, the T1 map has high levels of noise and poor signal to noise ratio compared with the T1-weighted image.  This introduces high levels of variability into the estimation of T1 levels and makes detecting subtle differences in T1 much more difficult.
  • T1 maps are more difficult to obtain, since the MRI machine must be very carefully calibrated.
  • They take longer to acquire, which can be problematic for some patient groups.
  • T1 maps are not included in most standard protocols.
  • Historically T1 maps have not been routinely acquired, which means they are missing from many long-term, longitudinal studies of multiple sclerosis and other neurodegenerative diseases.

This is where my research comes in.  Given a standard protocol, consisting of 4 clinical MRI scans, we want to predict the T1 map.  If we can do this with a reasonable degree of accuracy, we can overcome the problems above, as our estimated T1 maps will:

  • Have better signal to noise ratio, similar to the contrast images used in the T1 map prediction model.
  • Be easy to obtain using only the standard protocol of T1-weighted, T2-weighted, FLAIR and PD-weighted volumes.
  • Not require any additional scan time.
  • Be retroactively available for longitudinal and cross-sectional imaging studies of MS and other neurodegenerative diseases.

Stay tuned for a future post on how we are trying to do this.