Exploring biological data programmatically with Python & Biopython
This repository contains Python and Biopython programs developed as part of my first-semester undergraduate coursework for the subject Introduction to Biological Data / Biological Systems.
The focus of this repository is on fundamental bioinformatics tasks implemented using Python, with an emphasis on biological sequence analysis, data handling, and visualization.
- Course: Introduction to Biological Data / Biological Systems
- Level: Undergraduate (First Semester)
- Purpose: To gain hands-on experience in applying programming concepts to biological sequence data and basic bioinformatics workflows.
- Reverse complement of DNA sequences
- DNA β RNA transcription
- RNA β Protein translation
- Counting individual nucleotides (A, T, G, C)
- Percentage composition of nucleotides
- Processing sequences from input files
- Sequence manipulation using Biopython
- Reading biological sequences from files
- Performing transcription and translation using Biopython utilities
- Counting number of sequences in an input file
- Nucleotide frequency analysis across multiple sequences
- Reading sequence data from text and FASTA files
- Working with tabular data using pandas
- Organizing biological data for downstream analysis
- Line plots for biological data
- Scatter plots for sequence-related analysis
- Basic visualization using Matplotlib
- Python 3
- Biopython
- pandas
- NumPy
- Matplotlib
- RDKIT
The full directory structure is shown below:
Python-BioInformatics/
βββ sequences/
β βββ manual-transcription.py
β βββ transcription-biopython.py
β βββ fasta-to-dna-mrna-protein-sequences.py
β βββ fasta-to-protein-sequence.py
β βββ file-translation-to-stop-at-codon.py
β βββ stop-at-logic-for-translation.py
β βββ central-dogma-of-biology.py
β βββ translation-to-stop-at-codon.py
β βββ manual-translation.py
β βββ purine-pirimidine-translation.py
β
βββ complements/
β βββ complement-and-reverse_complement.py
β βββ complement-reverse_complement.py
β βββ complement-reverse_complement-from-sequence.py
β
βββ statistics/
β βββ length-of-sequence.py
β βββ nucleotide-count-from-file.py
β βββ nucleotide-count-and-length-of-sequence.py
β βββ nucleotide-count-from-file-plot.py
β
βββ file_handling/
β βββ sequence-conversions-from-file.py
β βββ multiple-files-to-sequeces.py
β
βββ medical_imaging/
β βββ displaying-dicom-file.py
β βββ reading-info-from-dicom-file.py
β
βββ cheminformatics/
β βββ smiles-to-png-with-atom-numbers.py
β βββ Smiles2image-using-RDKIT.py
β βββ descriptors-from-smiles.py
β βββ smiles2png-with-atomnumbers.py
β βββ smiles-to-pdb-hydrogen.py
β βββ smiles-to-descriptors-RDKIT.py
β βββ smiles-to-morganfingerprint.py
β βββ smile-descriptors-to-aromaticity.py
β βββ smiles-to-PDB.py
β βββ descriptors-from-smiles-as-a-file.py
β βββ descriptors-from-smiles-with-atoms-aromaticity.py
| File | Description |
|------|-------------|
| manual-transcription.py | Performs DNA β RNA transcription manually by replacing T β U. |
| transcription-biopython.py | Uses Biopython to transcribe DNA sequences into RNA. |
| fasta-to-dna-mrna-protein-sequences.py | Reads FASTA and generates DNA, mRNA, and protein sequences. |
| fasta-to-protein-sequence.py | Converts FASTA DNA sequences directly into protein. |
| file-translation-to-stop-at-codon.py | Translates DNA sequences but stops when encountering a STOP codon. |
| stop-at-logic-for-translation.py | Demonstrates algorithmic logic for STOP-aware translation. |
| central-dogma-of-biology.py | Complete DNA β RNA β Protein transformation. |
| translation-to-stop-at-codon.py | Translates sequences until the first STOP codon. |
| manual-translation.py | Manually maps codons to amino acids without external libraries. |
| purine-pirimidine-translation.py | Identifies purines (A,G) and pyrimidines (C,T,U) in sequences. |
| File | Description |
|------|-------------|
| complement-and-reverse_complement.py | Generates DNA complement and reverse complement. |
| complement-reverse_complement.py | Alternative method for generating complement strands. |
| complement-reverse_complement-from-sequence.py | Takes a user-provided sequence and returns complement + reverse complement. |
| File | Description |
|------|-------------|
| length-of-sequence.py | Calculates length of a nucleotide sequence. |
| nucleotide-count-from-file.py | Reads sequence from file and counts A, T, C, G. |
| nucleotide-count-and-length-of-sequence.py | Outputs both nucleotide frequency and length. |
| nucleotide-count-from-file-plot.py | Generates a plotted visualization of nucleotide counts. |
| File | Description |
|------|-------------|
| sequence-conversions-from-file.py | Reads DNA file and converts it to RNA and protein. |
| multiple-files-to-sequeces.py | Loads multiple sequence files and extracts sequences. |
| File | Description |
|------|-------------|
| displaying-dicom-file.py | Displays medical DICOM images. |
| reading-info-from-dicom-file.py | Extracts and prints metadata from DICOM files. |
| File | Description |
|------|-------------|
| smiles-to-png-with-atom-numbers.py | Converts SMILES to PNG with atom numbers labeled. |
| Smiles2image-using-RDKIT.py | Generates molecular images using RDKit. |
| descriptors-from-smiles.py | Extracts basic molecular descriptors from SMILES. |
| smiles2png-with-atomnumbers.py | Additional SMILES-to-image tool with atom indices. |
| smiles-to-pdb-hydrogen.py | Converts SMILES to PDB and adds hydrogens. |
| smiles-to-descriptors-RDKIT.py | Generates descriptor values using RDKit utilities. |
| smiles-to-morganfingerprint.py | Produces Morgan (circular) fingerprints. |
| smile-descriptors-to-aromaticity.py | Calculates aromaticity-related descriptors. |
| smiles-to-PDB.py | Converts SMILES to a PDB structure. |
| descriptors-from-smiles-as-a-file.py | Reads multiple SMILES from file and generates descriptors. |
| descriptors-from-smiles-with-atoms-aromaticity.py | Computes descriptors + atom-level aromaticity features. |
Through this coursework and practice, I developed:
- A strong foundation in biological sequence representation
- Practical experience using Biopython for sequence analysis
- Confidence in handling biological data programmatically
- Basic skills in visualizing biological datasets
- An interdisciplinary understanding of programming applied to life sciences
- This repository represents academic learning and practice, not a production-level bioinformatics pipeline.
- Code is written with a focus on clarity and understanding.
- The repository may be extended in the future with advanced bioinformatics or machine learningβbased analyses.
This work is based on material covered during the course Introduction to Biological Data / Biological Systems. Reference material is not publicly included to respect academic and copyright boundaries.
This project is licensed under the MIT License.
Copyright (c) 2026
Krish Singh (github.com/wasitkrish)