Note this is part 4 of a series on clustering RNAseq data. Check out part one on hierarcical clustering here ; part two on K-means clustering here ; and part three on fuzzy c-means clustering here. Clustering is a useful data reduction technique for RNAseq experiments. In previous posts, we discussed the usefulness of hard clustering techniques such as hierarcical clustering and K-means clustering. These techniques will partition all genes into co-expression clusters.
- From the archive (originally published 2017-04-04): Clustering is extremely useful for generating hypotheses and data exploration in general. The idea is that genes which have similar expression patterns (co-expression genes) are often controlled by the same regulatory mechanisms (co-regulated genes). Often times co-expressed genes share similar functions so by looking at which genes are found in a cluster we can get an idea of what that cluster is doing. Here we’ll show how to cluster RNAseq data using hierarchical clustering.
- From the archive: Clustering gene expression data allows us to identify substructures in the data and identify groups of genes that behave similarly. This method can help us identify genes that share a biological function (co-functional) and genes that are under the same control logic (co-regulated). Here we’ll show how to cluster RNAseq data using K-means clustering. We’ll address picking the appropriate clustering number then we’ll test drive some visualizations and plots.
- From the archive: Machine learning (in the informatics world) is like teenage sex: everyone talks about it, nobody really knows how to to do it, everyone thinks everyone else is doing it, so everyone claims they are too. Juvenile comparisons aside, the power of these tools can’t be ignored. Before applying most machine learning algorithms to DNA sequences they must first be converted to binary strings. Here we’ll show how to one hot encode a DNA sequence in Python using SciKit Learn.
- Note this is part 2 of a series on clustering RNAseq data. Check out part one on hierarcical clustering here and part two on K-means clustering here. Clustering gene expression is a particularly useful data reduction technique for RNAseq experiments. It allows us to bin genes by expression profile, correlate those bins to external factors like phenotype, and discover groups of co-regulated genes. Two common methods for clustering are hierarchical (agglomerative) clustering and k-means (centroid based) clustering which we discussed in part one and part two of this series.