• How to get GO terms from Refseq IDs

    A Gene Ontology analysis can add a lot of value to any omics study. Mapping GO terms to a newly sequenced genome or transcriptome can represent a challenge especially if the model system is… diverged. My typical functional annotation workflow usually involves. BLASTing gene sequences against RefSeq (although I typically use PLAST for this step since it’s much much fast than BLAST) BLASTing gene sequences against Uniprot databases Swissprot and Trembl.
  • One hot encode a DNA sequence using python and scikit learn

    From the archive: Machine learning (in the informatics world) is like teenage sex: everyone talks about it, nobody really knows how to to do it, everyone thinks everyone else is doing it, so everyone claims they are too. Juvenile comparisons aside, the power of these tools can’t be ignored. Before applying most machine learning algorithms to DNA sequences they must first be converted to binary strings. Here we’ll show how to one hot encode a DNA sequence in Python using SciKit Learn.