How to organize a PhD when buried under a mountain of data

I will preface this by saying I am not an organized person, if you need proof just look below at the picture of my desk.

Research projects are inevitable in life: their topics range from planning a trip or event to writing a PhD. At least for me, one of the hardest things about researching things and doing research projects is staying organized. But more on that later.

My desk

What is data and how can it become a mountain? Data is defined by the Oxford English Dictionary as “Facts and statistics collected together for reference or analysis.” Nowadays data infiltrated every aspect of our lives. One of the primary tasks during my PhD has been to identify how microorganisms use basaltic rock as a substrate. To do this I have collected tomography data at a variety of scales (producing data sets which can resolve features that are tens of micrometers to other data sets that can be used to observe features which are larger than 500 nanometers). Now that it is collected I have to analyse it all. As Pavel has mentioned in earlier posts tomography datasets are thousands of individual files that together can be used to create a 3D rendering of the object that was scanned.

It is because of this that I have ended up with a mountain of data to climb. The computer on my desk in the image above has 8 TB of storage. Next to my desk is a server which has a capacity of ~65 TB and scattered around my office and apartment are more than 15 portable hard drives, each with a capacity of at least 3 TB. At last look, I have over 40 TB of primary data, all of which must be stored in duplicate, most of these data will balloon to 3 times their original files sizes during the analysis process.

Datasets of this size are nothing new, and an entire field, Big Data, is dedicated to figuring out how to analyse, store, and manage such data sets. Organizing and managing these kinds of data is not very different than organizing any data or primary research you might conduct during a PhD project, MSc project or everyday life. The only difference here is magnitude.

I started my PhD over 2.5 years ago, and I went in naively thinking that setting up some folders to save things in an organized fashion would be enough. Little did I know that I would end up with so much data and ultimately, I have had to devise a system of managing it all on the fly. I would not recommend that. It makes things very confusing and rather unhelpful.

When managing personal datasets and personal research there is no best method so to speak. The best organization system is one that gets used and one that works for an individual. Note: this is not true for widely used datasets where versioning, a robust naming method, and consistent organization is key. That said, there are a few things that I have found make life much easier. Choose a method and stick with it. For example, if you start with putting the date in every file name so you know when the file was originally created then you should continue with that.

Personally, for everyday work and everyday analyses I have a panoply of folders that are split up into categories as you can see in the image below. I also store everything in a paid dropbox account (not an advertisement, I just love the service) so that all the files are automatically stored in the cloud as well, and very basic versioning is performed. This works passably well for me, but may not work for everyone.

File organization tree

So why does this matter for anyone who is not doing a big academic research project? Everyone has research projects, even if they do not necessarily think of them in that way. Where do I want to go on vacation? Where do I want to host a party? What is the best restaurant in my price range in my city? These are all questions which can be researched in everyday life. There are many ways to do so, a fair number of people like take the approach of flying by the seat of their pants, others will create detailed dossiers of their options. Those who take a lackadaisical approach may have once found the perfect restaurant, but cannot remember where it was or how they found it. They then end up not being able to return (I do this all the time). Alternatively, some may compile documents with tens of vacation options only to decide that they are not going this year. Finding a method of organizing files, data, etc that works for you can streamline your entire research process. I know it certainly worked that way for me.

The mystery of too-deep earthquakes

Look at the depth distribution of earthquakes on Earth (Fig. 1):

Fig. 1: Depths of earthquakes on Earth. Shallow earthquakes (0-60 km) are in red, intermediate-depth earthquakes (60-300 km) in purple and deep earthquakes (>300 km) in blue. Data from the International Seismological Centre.

In general, earthquakes are located at the boundaries between tectonic plates. Shallow earthquakes (< 60 km) happen at all plate boundary types, but intermediate (60-300 km) and deep (> 300 km) earthquakes mainly occur in subduction zones, where one plate moves beneath another. Because these earthquakes are located either within the subducting plate or between the two plates, they get deeper and deeper the further they are from the surface trace of the plate boundary. Because the plate located west of South-America moves towards the east and is subducted under South-America (Fig. 2), the earthquakes on the west coast of South-America get deeper from west to east (Figs 1, 2). Continue reading

Another way to cook rocks: in water!

Remember the blogpost where experiments were compared to cooking? Well, we can take that parallel even further and use more water to heat the rock with! As I mentioned in a previous post, rocks at the seafloor can get altered when fluids move through them. We call these hydrothermal systems, which can have spectacular venting sites on the seafloor. One way to find out more about the process behind these systems is studying natural examples that underwent hydrothermal fluid circulation; another way is by trying to reproduce these processes in the lab. But how can we do this when water is involved? Continue reading

An adventure on the JOIDES Resolution: One year later

Fig. 1 – (Top) One of the many magnificent sunrises observed by scientists on board the JR; (Bottom) View of the derrick, tower that holds the drill string, from the Bridge Deck, and (Top-left) all Expedition 360 participants. Images credits: William Crawford, Exp. 360 Senior Imaging Specialist; Jiansong Zhang, Exp. 360 Education/Outreach Officer.

Earlier this year Barbara wrote about ‘Life on board of a scientific drilling vessel’. That interview gave some hints in the unique experience my colleagues and I shared on board the Joides Resolution. Now, you might wonder what Joides Resolution (JR) exactly is. The JR is a drilling vessel dedicated to scientific research on ocean and ocean crust dynamics. Different disciplines are involved, from geology (to elucidate the formation of the oceanic crust), to climate change science (to understand how the Earth handled past climatic events), oceanography (to study global water circulation), or microbiology (to track extreme life in rocks forming the ocean floor).Cores of rocks are drilled under the ocean floor, giving scientists a glimpse into Earth’s dynamics. The JR works for the international research program IODP (International Ocean Discovery Program), a marine research collaboration that aims at recovering data recorded in seafloor sediments and rocks, and monitoring subseafloor environments. Continue reading

Rocks never forget!

Figure1. Elephant rock, Castelsardo, Sardinia, Italy (picture by Vid Pogačnik)

Did you know that some rocks can have an incredible “magnetic memory”? The age of rocks can vary from seconds to billions of years, and besides their sometimes very old age they store information that is useful to reconstruct the history of our Planet.

We commonly use the word “memory” referring to our computer storage capacity or our own ability to remember. Rocks store information, but unlike us they are able to do it over longer periods of time. The oldest memory we have is limited to what humankind experienced but some rocks are much older than humans. Therefore it is really important to be able to extract their memories in order to better understand what we didn’t experience ourselves.  

This “magnetic memory” relates to certain minerals in rocks (e.g. magnetite, hematite) able to record the direction and the intensity of the Earth’s magnetic field when they form.

Continue reading

Surprise: Catching bugs in rocks!

X-ray tomography is a powerful technique that allows us to see very tiny details inside a rock. However, the image acquisition is usually just a starting point for the image analysis. In order to get quantifiable information, one has to develop specific image processing algorithms. In the porous medium research, one of the most important processing step is the development of the task oriented image segmentation algorithm.

While trying our segmentation algorithm on a 3D image of a sedimentary rock, we found some curious piece of a former life! The “worm” you can see in the video is an orthoceras — an ancient mollusk that is often found in sediments.

This carbonate rock has been cored in the Miocene carbonate platform of Llucmajor, in Majorca. The rock has suffered a re-equilibration from aragonite to calcite (dissolution of aragonite and crystallization of calcite). This reaction led to the formation of porosity (grey parts of the picture). In this case the spatial distribution of the pores has been controlled  by the pre-existing structure of the rock. This process allowed the preservation of the shape of the fossil, even after re-equilibration and recrystallization into calcite. That is why we can see the orthoceras, although its skeleton has undergone chemical alteration.

Figure 1. This video is a series of 2D slices of a 3D volume. No orthoceras is actually swimming here

Continue reading