vignettes/vig1_intro.Rmd
vig1_intro.Rmd
Data from bulk RNA-seq, single cell RNA-seq (scRNA-seq), and spatial transcriptomics all come from spatially organized tissues (except for blood), where such spatial organization plays a role in the properties and functions of the tissues. For this reason, a geographical analogy may be more adequate than the traditional analogy of the smoothie vs. fruit salad vs. tart.
Many of you attending this conference in person have traveled to Seattle, and might explore Seattle after the talks. Bulk RNA-seq is an averaging assay, and in analogy, you could consider information averaged over Washington State. For example, on average, Washington State tends to have fewer sunny days per year than California. However, each state has diverse climate and weather, and tourists prepare for weather in individual cities rather than the state average.
Non-spatial scRNA-seq is like a list of tourist attractions in Seattle without spatial locations, where each tourist attraction is analogous to a cell. There’s the Space Needle, the Museum of Pop Culture, Chihuly Garden and Glass, Smith Tower, Pike Place, Museum of Flight, and etc. These tourist attractions can also be classified based on their characteristics, such as architectural style, function, and era of construction, analogous to gene expression. Dimension reduction can be performed on these numerous characteristics. Compared to the state averages, this would be of more interest to a tourist.
But how does one navigate to various tourist attractions? When you look at the map, you find that certain kinds of tourist attractions tend to cluster in space, such as the cluster of museums in the vicinity of Space Needle and the cluster of older buildings around Smith Tower. Different regions of Seattle with different vibes and functions are also annotated on the map. There are historical reasons that led to such spatial regions and clustering, such as the 1962 World’s Fair that gave rise to the Space Needle. Here we see how locating the tourist attractions in space point to a deeper understanding of the properties of Seattle. Spatial transcriptomics is like studying a map of Seattle.
Based on how the spatial context is preserved, spatial transcriptomics data collection technologies fall into 5 categories, sorted by extent of current usage (plot showing number of publications per category), though there are gray areas:
Data analysis methods written for spatial transcriptomics can be
broadly categorized as upstream and downtream. In upstream analysis, the
raw data is converted into more usable forms, such as getting the gene
count matrix from fastq files and cell type deconvolution of Visium
spots. Downstream analysis begins with the more usable form of data for
further biological inferences, such as finding spatially variable genes,
spatial regions informed by gene expression, and cell-cell interactions.
SpatialFeatureExperiment
, as a way to represent data, is
more upstream, while Voyager
, for exploratory spatial data
analysis (ESDA), is a little more downstream.
In the literature when spatial transcriptomics data is generated, it is often treated as non-spatial scRNA-seq data, and some data analysis methods for spatial transcriptomics data do not take the spatial information into account and are sometimes aimed at non-spatial data as well. For example, most cell type deconvolution methods don’t take into account spatial autocorrelation in cell type distribution, and deconvolution methods for bulk RNA-seq is sometimes used for Visium data. However, the spatial information presents many opportunities unavailable to scRNA-seq, and this workshop will explore some of these opportunities. Among the opporunities are:
Voyager
package.Opportunities presented by spatial data can be explored by leveraging a vast tradition of tools made for geospatial data. Geospatial data broadly fall into two types by representation: vector and raster. Vector data represents the world as points, lines, and polygons, specified by coordinates. For example, polygons would be specified by coordinates of the vertices. Raster data are basically images, where each pixel has a value, though unlike the typical RGB image, raster data can have many different layers analogous to the 3 channels in RGB images. Raster is common in remote sensing.
Vector data can be further classified by the processes generating the data:
For each of these data types, there are already well-established
tools for data analysis. For example, sf
represents vector
data and makes it behave like a data frame in R, spdep
can
be used for spatial dependency of areal data, INLA
can be
used for Bayesian spatial modeling, gstat
is used for
interpolation with geostatistical data (kriging), spatstat
is used for point process data, sfnetworks
for network
data, and spatstat
supports point process analysis within a
network.
If we can relate these data types to spatial transcriptomics, then we can take advantage of these data analysis tools:
Some geospatial data analysis methods have already been used in spatial transcriptomics, such as Gaussian process regression (which kriging is based on), Potts model, spatial point process analyses, conditional autoregressive models, and global and local variants of Moran’s I. (get the references) However, these sptial transcriptomics packages and data analysis in the literature don’t fully explore the opportunities given by the spatial information mentioned above, nor do they always take advantage of the existing software infrastructure originally built for geospatial data.
This workshop presents the packages
SpatialFeatureExperiment
and Voyager
, which
bring more of the existing vector geospatial data analysis tools to
spatial transcriptomics. SpatialFeatureExperiment
(SFE)
implements a new S4 class extending SpatialExperiment
with
sf
data frames representing cell or Visium spot polygons,
geometries of other objects, anatomical regions, and tissue boundaries,
and spatial neighborhood graphs. Voyager
to SFE is just
like scater
, scran
, and scuttle
to SingleCellExperiment
, implementing basic ESDA using
well-established geospatial tools and plotting functions for geometries
and ESDA results. With sf
, we focus on vector areal data
here, because of the popularity of Visium, and that although the
fluorsecent or H&E images are raster, vector features extracted from
the images, such as cell and nuclei segmentation, and transcript spots,
are more commonly used for downstream analyses.
No analogy is perfect, and these are some limitations of the geospatial analogy that would call for data analysis methods specific to spatial transcriptomics:
spatstat
), the support may be limited.Voyager
. On disk tools exist for geospatial data, but
are not supported by SFE yet.