Thursday, October 10, 2024

Adding GPG keys to Github account

 Github has vigilant mode which verifies the commit was made by the user who can be verified by the gpg or ssh keys. By default, if you make any commit from the web interface, they seem to be verified. However, we want to enable it from the command line interface. 

1. Go to settings in your GitHub account.

2. Click on the SSH and GPG keys section. You will notice a section on how to generate a gpg key and add it your account.  

https://docs.github.com/en/authentication/managing-commit-signature-verification

3. Go to "Generating a new GPG key". Since we are using Mac, we will be using MacOS instructions.

If gpg is not installed, install using brew 

Thursday, August 15, 2024

Installing data.table R package on M1 Macs

 For data.table install, if you plan on using multiple threads, you have to enable it during installation/compilation. When loading the package, you will see a warning that says


data.table 1.x.x using 1 threads (see ?getDTthreads). 

********

This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode. This is a Mac.

********


Just go to this link:

https://github.com/Rdatatable/data.table/wiki/Installation

0. May need to install gfortran 

brew install gfortran

and these lines are needed in ~/.R/makevars file:


LDFLAGS += -L/opt/homebrew/opt/libomp/lib -lomp

CPPFLAGS += -I/opt/homebrew/opt/libomp/include -Xclang -fopenmp


1. 

curl -O https://mac.r-project.org/openmp/openmp-16.0.4-darwin20-Release.tar.gz

sudo tar fvx openmp-16.0.4-darwin20-Release.tar.gz -C /

This command will put these files in /usr/local/lib and /usr/local/include

You can inspect the downloaded tar.gz file by uncompressing it and see how it is structured. 


2. 

PKG_CPPFLAGS='-Xclang -fopenmp' PKG_LIBS=-lomp R CMD INSTALL /Users/ashish/downloads/data.table-1.15.4.tar.gz 

Just use the above command to install it with flags to detect the openMP install. 


Next time you use the package, it should use multiple threads. Check this under R prompt after loading data.table package:


library(data.table)

getDTthreads()



Saturday, August 10, 2024

Python setup tools

 Python is a very popular programming language. It is somewhat easier to read than other programming languages. Best 2nd choice for many tasks that includes data analysis and plotting. R is amazing for my use case and I feel it is the better choice especially when it comes to statistical test and plotting. Ggplot2 is nearly unrivaled for the ergonomics. 

However, the whenever I am determined to use Python, I am overwhelmed by the setup process.  This is focused on the setup in the MacOS world.  ( I won't go over Conda at all since for some reason it has sowed more confusion). 

This is a draft version of the post which will get updated as I learn more. 

1. Python Version: 

2. Pip (Official way to install)

3. venv (Comes with the Python 3 version)

4. UV (Not platform agnostic as Poetry) : This is more for resolving package dependency as compared to pip. So far it has worked good for me which means 

    1. Creating virtual environment:

            uv venv uenv1

  2. Activating virtual environment

      source uenv1/bin/activate

  3. Installing your programs 

       uv pip install pandas

Vscode should detect the folder within which we have installed this virtual environment. 

5. Pipx

Got pipx installed using brew install pipx and then was able to install uv. 

 Pipx is supposed to be subset of the pip which means it can only install command line tools but not all the packages such as Pandas or Polars which need to be imported as library.  This is aptly described at: https://pipx.pypa.io/stable/comparisons/

6. Poetry

7. PDM

8. PyPy 



Tuesday, April 9, 2024

Comparing R and Python

 I have used R for quite some time for data analysis. Especially with the use of Tidyverse package, it has been a very decent experience. Ggplot2 package for plotting is mostly intuitive. Synergy of Tidyverse ecosystem along with availability of bioinformatics and statistical analysis software with the R platform, it is an awesome combination. 

Recently, I have wondered to try out Python for my daily microbiome data analysis. Julia was another option but for some reason, it feels still incomplete. There have been decent attempt to replicate the tidyverse package in Julia such as Tidier.jl https://github.com/TidierOrg/Tidier.jl. However, it still feels work in progress. 

There have been time when trying to code with more defensive approach in R has lead to very cumbersome code.  For example when trying to apply try and catch statement. This example was generated using chatGPT 4 which was similar to my use case:


-----------------------------------------------------------------------------------------------------------------------------

Adding GPG keys to Github account

 Github has vigilant mode which verifies the commit was made by the user who can be verified by the gpg or ssh keys. By default, if you make...