Note this is part 4 of a series on clustering RNAseq data. Check out part one on hierarcical clustering here ; part two on K-means clustering here ; and part three on fuzzy c-means clustering here. Clustering is a useful data reduction technique for RNAseq experiments. In previous posts, we discussed the usefulness of hard clustering techniques such as hierarcical clustering and K-means clustering. These techniques will partition all genes into co-expression clusters.
- From the archive: Machine learning (in the informatics world) is like teenage sex: everyone talks about it, nobody really knows how to to do it, everyone thinks everyone else is doing it, so everyone claims they are too. Juvenile comparisons aside, the power of these tools can’t be ignored. Before applying most machine learning algorithms to DNA sequences they must first be converted to binary strings. Here we’ll show how to one hot encode a DNA sequence in Python using SciKit Learn.