Thursday, August 15, 2024

Installing data.table R package on M1 Macs

 For data.table install, if you plan on using multiple threads, you have to enable it during installation/compilation. When loading the package, you will see a warning that says


data.table 1.x.x using 1 threads (see ?getDTthreads). 

********

This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode. This is a Mac.

********


Just go to this link:

https://github.com/Rdatatable/data.table/wiki/Installation

0. May need to install gfortran 

brew install gfortran

and these lines are needed in ~/.R/makevars file:


LDFLAGS += -L/opt/homebrew/opt/libomp/lib -lomp

CPPFLAGS += -I/opt/homebrew/opt/libomp/include -Xclang -fopenmp


1. 

curl -O https://mac.r-project.org/openmp/openmp-16.0.4-darwin20-Release.tar.gz

sudo tar fvx openmp-16.0.4-darwin20-Release.tar.gz -C /

This command will put these files in /usr/local/lib and /usr/local/include

You can inspect the downloaded tar.gz file by uncompressing it and see how it is structured. 


2. 

PKG_CPPFLAGS='-Xclang -fopenmp' PKG_LIBS=-lomp R CMD INSTALL /Users/ashish/downloads/data.table-1.15.4.tar.gz 

Just use the above command to install it with flags to detect the openMP install. 


Next time you use the package, it should use multiple threads. Check this under R prompt after loading data.table package:


library(data.table)

getDTthreads()



Saturday, August 10, 2024

Python setup tools

 Python is a very popular programming language. It is somewhat easier to read than other programming languages. Best 2nd choice for many tasks that includes data analysis and plotting. R is amazing for my use case and I feel it is the better choice especially when it comes to statistical test and plotting. Ggplot2 is nearly unrivaled for the ergonomics. 

However, the whenever I am determined to use Python, I am overwhelmed by the setup process.  This is focused on the setup in the MacOS world.  ( I won't go over Conda at all since for some reason it has sowed more confusion). 

This is a draft version of the post which will get updated as I learn more. 

1. Python Version: 

2. Pip (Official way to install)

3. venv (Comes with the Python 3 version)

4. UV (Not platform agnostic as Poetry) : This is more for resolving package dependency as compared to pip. So far it has worked good for me which means 

    1. Creating virtual environment:

            uv venv uenv1

  2. Activating virtual environment

      source uenv1/bin/activate

  3. Installing your programs 

       uv pip install pandas

Vscode should detect the folder within which we have installed this virtual environment. 

5. Pipx

Got pipx installed using brew install pipx and then was able to install uv. 

 Pipx is supposed to be subset of the pip which means it can only install command line tools but not all the packages such as Pandas or Polars which need to be imported as library.  This is aptly described at: https://pipx.pypa.io/stable/comparisons/

6. Poetry

7. PDM

8. PyPy 



Adding GPG keys to Github account

 Github has vigilant mode which verifies the commit was made by the user who can be verified by the gpg or ssh keys. By default, if you make...