One of my hobbies lately has been to get either RNA seq or microarray data from GEO and do quick analyses. Not only is this fun, I can find good examples to use for teaching biology.
One of these fun examples comes from some Arabidopsis data. In this experiment, some poor little seedlings were taken out of their happy semi-liquid culture tubes and allowed to dry out. This simulated drought situation isn't exactly dust bowls and hollow-eyed farmers, but the plants don't know that and most likely respond in a similar way.
We can get a quick idea of how the plants feel about their situation by looking at the data in a volcano plot.
These graphs are called "volcano plots" because it looks like data points are spewing out a volcano. There are about 28,000 or so of these spots on this graph. Each spot represents a bit of RNA that was measured in this experiment. When a gene is "expressed," a bit of RNA is produced. We can measure that RNA to find out what changes are happening inside of a cell.
We often say that genes are "turned on" when they're expressed (i.e. RNA copies are being made) and "turned off" when RNA copies are no longer produced. We can find out alot of things by measuring the concentration of RNA.
On the X axis, in our volcano plot, we see the change in RNA concentration when we compare two conditions. The first condition represents the average results from the control samples (think "happy plants"). The second condition is an average from several samples taken over a 24 hour period where the plants were drying out. I put lots of samples and time points together in the second condition.
If a data point falls in the middle of the graph, that means that the RNA levels from those gene are roughly the same in both happy and stressed out plants. These spots are colored gray because I told the software (GeneSifter from Geospiza) that I wanted to see genes whose expression changed at least 1.5 fold.
The side that a spot falls on helps us figure out how gene expression changed when plants were drying out. If a gene produced more RNA in a happy plant, it's represented as a blue spot on the left side of the graph. If a gene produced more RNA in a drying plant, we see it represented, in red, on the right side of the graph.
Our graph shows us that some genes are turned on in happy plants, and some genes are turned on in stressed plants. GeneSifter also lets me click spots and see what the genes are, but we'll skip that for now.
We can also see how much gene expression changed by looking at the distance from the center. Points that are farthest from the center show us genes whose expression changed to a greater extent.
The y axis shows us the significance of the result. Points that are higher are more significant. In this case, it means that the measurements were closer together. This graph makes it look like the data from the happy plants is more significant. This is probably because those two samples came from the control plants where the other measurements represent average RNA concentrations from plants sampled at many different time points.
Anyway, the RNA measured in the top graph came from the stem and leaves of the plants.
Look what happens when we take a look at the roots!
Do the graphs look different? What do you think is going on in the drought-stressed roots?
Assuming that it's array data, are you retrieving raw or normalized data?
The data were normalized with RMA. GeneSifter was also set up to process the data using a t test, the p value cutoff is 0.05.
I should add - this is microarray data - but you can't assume that from the graph. Genesifter makes the same kinds of graphs with RNA Seq data, too.
I guess pretty much the same thing that is going on in the stems and leaves, except to a lesser extent because the roots can retain the moisture that they do collect, rather than sharing it with the rest of the plant, and so they do not suffer the drought as badly.