Eric Leung

Get all dates for a day of the week

2023-07-18T00:00:00-07:00

For some date-specific work, I wanted to get a list of all dates in the year, for a specific day of the week. Here is how I did this using R.

Using {lubridate}, the pseudocode is:

get a list of all days in the year,
convert dates into day of the week, and
pull all of that date into a vector

# Setup
library(lubridate)
library(dplyr)

# Get all days in the year
all_year <- ymd(20230101) + days(1:364)

# Get all Mondays
data.frame(date = all_year) %>%
  mutate(dow = wday(date)) %>%
  filter(dow == 1) %>%  # Get all Sundays
  pull(date)

 [1] "2023-01-01" "2023-01-08" "2023-01-15" "2023-01-22"
 [5] "2023-01-29" "2023-02-05" "2023-02-12" "2023-02-19"
 [9] "2023-02-26" "2023-03-05" "2023-03-12" "2023-03-19"
[13] "2023-03-26" "2023-04-02" "2023-04-09" "2023-04-16"
[17] "2023-04-23" "2023-04-30" "2023-05-07" "2023-05-14"
[21] "2023-05-21" "2023-05-28" "2023-06-04" "2023-06-11"
[25] "2023-06-18" "2023-06-25" "2023-07-02" "2023-07-09"
[29] "2023-07-16" "2023-07-23" "2023-07-30" "2023-08-06"
[33] "2023-08-13" "2023-08-20" "2023-08-27" "2023-09-03"
[37] "2023-09-10" "2023-09-17" "2023-09-24" "2023-10-01"
[41] "2023-10-08" "2023-10-15" "2023-10-22" "2023-10-29"
[45] "2023-11-05" "2023-11-12" "2023-11-19" "2023-11-26"
[49] "2023-12-03" "2023-12-10" "2023-12-17" "2023-12-24"
[53] "2023-12-31"

The wday() function defaults to Sunday being 1. This can be changed by setting the week_start parameter to another day of the week.

For example, this is how you’d make the week start on Monday.

# Start beginning of week on a Monday
wday("2023-07-18", week_start = 1)  # This date is a Tuesday
# [1] 2

# Defaults to Sunday being beginning of the week
wday("2023-07-18")
# [1] 3

Updating your local branch after getting GitHub suggestions

2023-06-19T00:00:00-07:00

GitHub has a useful feature to add code change suggestions right in the web UI.

This is great. But then what if you want to continue editing locally? Here is some code to help you do that, starting from making the initial pull request.

git checkout -b new-branch

# ...Make changes

git add files-changed.txt
git commit -m "Changed files"
git push origin new-branch

# On GitHub some suggestions are made

git fetch

git checkout new-branch
git pull main new-branch

How I created an RStudio addin, pyblack, to format Python code with black

2023-06-06T00:00:00-07:00

I recently created a small (toy) project called pyblack. It helps format your Python code in RStudio with the popular formatter, black.

This started out with writing Python code in RStudio and wanting to format it, specifically in RMarkdown and Quarto code chunks. With R, RStudio has a built-in formatter, namely {styler}. I wanted a similar tool for Python, so here is a little behind the scenes on how I did this.

I actually created another RStudio addin called unnestIfElse to help automatically convert long nested ifelse() statements to a nicer dplyr::case_when().

I didn’t write my thoughts about it previously like I am with this addin, but looking at my comments, I may have inspiration from AlignAssign. This addin aligns assignment operators within a highlighted area.

Regardless, I have up to two places to draw code from that do what I want. Namely, I want some code to help take code from some highlighted area and then change it.

The first important function to learn about is the getSourceEditorContext() function. It comes from the {rstudioapi} R package¹ and can extract highlighted text into an object.

capture <- rstudioapi::getSourceEditorContext()

This returns a nested list with, among other things, the selected text from an editor. This is progress.

After some exploration, I found that I could get the correct text using this²:

code <-
    capture %>%
    magrittr::extract2("selection") %>%
    magrittr::extract2(1) %>%
    magrittr::extract2("text")

Next, I needed to figure out how to take this code and format it using black.

Using the system2() function, I can have R call system commands.

After some troubleshooting, I figured out how to also specify a pyproject.toml file for black to reference when following custom user configuration.

So I finally did enough troubleshooting to translate this

black -v --config ~/path/to/pyproject.toml file_to_format.py

to this

system2(
  "black",
  c(
    "-v",
    "--config ~/path/to/pyproject.toml",
    "file_to_format.py"
  )
)

I added the -v for future troubleshooting ease³.

Now how do I get to file_to_format.py? I found another example of prettifying code using prettifyAddins. At first glance, this would have done the job. But this only apply black to Python files. I wanted a way to format Python code chunks in RMarkdown.

But what I did get from this addin is the idea to write out the extracted code to a temporary file to be formatted.

tmpFile <- tempfile(fileext = ".py")
writeLines(code, tmpFile)

I got some feedback that if there are lots of code blocks, there will be lots of input/output writing that can cause things to slow down. Unfortunately, I couldn’t find a way to cleanly stream code directly to black without dealing with a long-troubleshooting-with-escaping-quotes headache⁴.

Now after styling with black, I can reinject the code using this code here.

contents <- style_black(code)
studioapi::modifyRange(
  location = capture[["selection"]][[1]][["range"]],
  text = contents,
  id = capture[["id"]])

This pulls the location metadata from the initial source context when we extracted the text from the editor.

All is well. My initial goal is done. But I got challenged to see if I could then apply this formatting on all Python code chunks in an RMarkdown or Quarto document.

Based on how I have been extracting code and replacing it, I expected a world of hurt from a number of for loops and making sure I was tracking code positions correctly⁵.

Thanks to Alex, they gave me code similar to the below that solves just this.

document <- parsermd::parse_rmd(file, parse_yaml = FALSE)

document <- purrr::modify_if(
  document,
  .p = function(chunk) {
    inherits(chunk, "rmd_chunk") &&
      identical(chunk$engine, "python") &&
      # Check whether code chunk explicitly says `black = FALSE`
      ifelse(is.null(chunk$options$black),
             TRUE,
             as.logical(chunk$options$black))
  },
  .f = function(chunk) {
    chunk$code <- style_black(chunk$code)
    chunk
  }
)

writeLines(parsermd::as_document(document), file)

I was mostly unfamiliar with the functions here, but ultimately, this makes use of the {parsermd} R package. This package parsed the Markdown-like document into an abstract-syntax tree (AST) to then be manipulated programmatically⁶.

With that complete, I now have two ways to format Python code:

Style selected code that I highlighted
Style all Python code blocks in an entire RMarkdown/Quarto document

The last step is to then specify my functions in inst/rstudio/addins.dcf so that RStudio knows these are addins like below.

Name: Style selection with black
Description: Style selected Python code with black
Binding: style_black_selection
Interactive: true

Name: Style active file with black
Description: Style active RMarkdown or Quarto Python code blocks with black
Binding: style_active_file_black
Interactive: true

In conclusion, I hope you’ve enjoyed learning a bit on how to programmatically manipulate text in RStudio and now have a reference for if you too want to create your own RStudio addin. Here is the project again if you want to take a look order try it for yourself https://github.com/erictleung/pyblack.

ICYMI Posit has an API to programmatically access RStudio! ↩
I like to use these convenient {magrittr} functions besides the %>% ↩
This returns more verbose stdout and stderr when formatting ↩
This only works for simple examples like black --code "print ( 'hello, world' )" ↩
This especially gets messy when injecting new code that will then change the initial text positions. Sounds like some recursive programming that I don’t want to get into. ↩
This package is magic. I want to learn what more I can do with this later. ↩

How to create a custom 404 page on Jekyll

2022-06-15T00:00:00-07:00

While on Google Search Console for this site, I found this error.

Submitted URL seems to be a Soft 404

I know I have a 404 page because I made it fun to have a message referencing Winnie-the-Pooh saying, “Oh bother!” However, in making this site, I forgot one piece that makes this an official 404 page and I’ll outline how to change that below.

According to GitHub’s documentation on GitHub Pages, not only do you have to create a file named 404.md or 404.html, it needs the following in the YAML front matter.

---
permalink: /404.html
---

This should officially designate this page as the 404 page rather than a page that coincidentally has the same URL.

After adding this, my original 404 page was automatically removed from my sitemap page.

Everything I googled in a week as a professional data scientist

2022-04-21T00:00:00-07:00

I ran across this blog post from a software engineer who decided to document what they googled in a week of work.

Their goal was to dispel the idea that “if you have to google stuff you’re not a software engineer.” I wanted to do something similar, but from the perspective of a data scientist.

Disclaimer: although “data science” is such a broad field and my account won’t be representative of all data workers out there, I thought it would be data point for us to have to understand what could go on in our day-to-day. This week apparently was full of package development with {pkgdown}, plotting results with {ggplot2} and making small aesthetic changes, and making a table with {gt}.

Monday

pkgdown Failed to parse example for topic - Turned out some code in an function example was invalid

git ammend specific commit message - Wanted to be more clear with a commit message

pkgdown Topics missing from index - A function was missing from my references page so I just put it back in the _pkgdown.yml file and all was good

roxygen2 documentation - Needed an overall page on roxygen2 syntax

Tuesday

gt add table header - Found the official website and just took a look at the introduction page

gt change header color - Wanted to change the color, found tab_options() and found the parameter column_labels.background.color to change the color

forcats relevel factors - To have more control on how a plot is created, I need extra control on my factors

forcats relevel by other variable - Self-explanatory, this page was useful

r get just file name of file path - Stack Overflow to the rescue with basename() and also dirname() here

gt left align columns - Eventually got me to find the cols_align() function

ggplot2 change order of legend - Need to change order of the factor levels with help here

ggplot2 change order of stacked bar - Again, factor reorder

r scales change axis to thousands - This question was good enough because it led me to a comment about unit_format(), which brings me to the next search…

r scales unit_format - Which brings me to the official documentation page and what I needed was the unit and scale parameters

r ggplot2 add numbers to bar plot - Needed geom_text() and passing in a label aesthetic

ggplot2 add two labels to bar plot - I ended up back at my previous search, but figured because of the power of ggplot2, I can simply have two geom_text() calls with two different aesthetic mappings, one to each kind of label I wanted and adjust them accordingly to fix the plot

Wednesday

ggplot2 stacked bar - This site helped

ggplot2 legend on top - Possible with + theme(legend.position = "top")

ggplot2 empty space - I wanted to make an empty space between certain bars in my bar plot, but I figured it might easier to make an empty space instead. So…

forcats add factor - Just the documentation page

ggplot2 format x-axis labels - A solid general resource

ggplot2 change ordering of legend - I found this site , but the answer seems outdated because it doesn’t work

ggplot2 change labels with one function - I kind of didn’t search for this one exactly, rather, I used my Twitter to find the answer that uses the labs() function

ggplot2 color code geom_text - You can simply pass in a color aesthetic and manually color it

ggplot2 change number of rows in legend - I can use guides(colour = guide_legend(nrow = 1)

gghighlight - Didn’t end up using it, but still a useful package to know about

ggplot2 format y-axis - The {scales} package is absolutely wonderful, but I keep on forgetting which function to use

ggplot2 geom_col side by side bars - I always forget the position = "dodge"

ggplot2 match geom text with dodged bars - With position_dodge() within geom_text()

Thursday

ggplot2 bar width - Looks like a simple width = X in your geom_bar()

ggplot2 scales label_number - Good documentation is the best

ggplot2 change text size - Such a common thing I’d imagine this would be easier. I was in a time crunch so maybe there’s a better way for another time

?geom_vline - I remembered this is to generate a vertical line, but I have forgotten the parameters, so I ran this one right in RStudio

ggplot2 add textbox - Ah with the annotate() function

Friday

ggplot2 better spacing of geom_text stacked bar plot - This brought me to learn about the lineheight paremter, but ultimately, I wanted the text not to overlap, and after looking at the documentation, geom_text has a built-in parameter check_overlap for just this.

ggrepl for stacked bar plot - …But after using the solution above, I realized that check_overlap actually removes text that overlaps, which I didn’t want. I then found this post using ggrepel. I knew about this package but wasn’t sure if it was useful for stacked bar plots. The example here kind of works, except it changes the location of text I don’t want moving, like in the larger bars. I abandoned this and simply removed “bars” with zero values.

ggplot2 show all factors in legend - Added a drop = False in there, found here

ggplot2 stacked bar plot position dodge with change in x - I was frustrated with where the text annotation for my columns were. This solution here didn’t exactly solve it outright for me, but it did show me what’s possible to move around the column label. The parameter I was looking forward was simply the x and y aesthetics, which allow me to fine tune where my text labels are. In hindsight, this makes sense.

Guess I was wanting to be a bit more verbose on my thoughts on these challenges. At this point, I was doing some very custom changes to my plots.

Reflection

A similar conclusion to the software engineering post I linked at the beginning, being a data scientist will still need to search and look things up. Regularly.

I’ve never really thought too much about what I’ve had to search for during my job. This turned out to be a really fun exercise in mindfulness. Ideally, I would keep track of these kinds of searches and then find ways to write helper functions to do these things for me. Alas, a low priority for now. But a possible side project idea.

Altogether, thank you Stack Overflow solutions, the whole ggplot2 system, and the countless volunteers out there writing out their solutions on the web for making my work possible.

Git shallow clone for faster version control

2022-01-06T00:00:00-08:00

Contributing to open-source software is great fun. The feeling of being a part of a larger community and adding to something larger than yourself. As a consequence, you work on large projects with lots of version control history.

This post is a reminder to myself to use this git clone flag to make it easier on my hard drive and make git work faster when doing day-to-day version control commands.

The key flag is the --depth flag. According to the documentation, this flag helps to

Create a shallow clone with a history truncated to the specified number of commits.

The specified number of commits is an integer that comes after the --depth flag.

For example, I worked on the freeCodeCamp main repository and it has twenty-nine thousand commits as of this writing. This is a lot.

So to clone this repository without so many of those commits that I won’t need, you can run this command.

git clone --depth 100 https://github.com/freeCodeCamp/freeCodeCamp.git

This will get only the last 100 commits from this repository. More on this flag here https://book.git-scm.com/docs/git-clone.

On creating the pixarfilms R package

2021-03-03T00:00:00-08:00

I’ve never published an R package all the way to CRAN before. So I finally decided it was time. So here, I will make brief notes of steps I took to publish it to CRAN and some resources that helped me along the way.

Note, this is a data-specific package, so the package development was light on noting useful functions for an actual useful package.

Getting the data using the {rvest} package

I like Pixar films and so I wanted to create a package to explore information about these films.

The data I wanted to scrape was on Wikipedia here.

The package that came to mind was to use {rvest} package to help me scrape the information.

I have also seen the {rvest} package used along with the {polite} package to scrape data. But unfortunately, I had some issues using the {polite} package (version 0.1.1) on my Windows machine where R couldn’t find a function.

Error in validate_key(key) : could not find function "validate_key"

So I abandoned using it. In the future, I will revisit this package and hope I will be able to use it next time.

Saving data out using the {usethis} package

To save out the CSV versions of these files, I wanted to automate how I write out the files. So below, I wrote a simple function that will take the object you want to save and save it out as a CSV file in the data-raw/ directory.

#' Save out for external use
#'
#' Write out a data frame to a CSV into the `data-raw/` directory with the same
#' name as the data frame itself.
#'
#' @param x data.frame
#'
#' @example
#' # Saves the mtcars dataset to the path `data-raw/mtcars.csv`
#' save_data(mtcars)
save_data <- function(x) {
  # Notes on deparse() and substitute()
  # https://stackoverflow.com/a/14577878/6873133
  str_path <- paste0(deparse(substitute(x)), ".csv")
  write_csv(x, here("data-raw", str_path))
}

These files are only used to keep a CSV record of the data.

The more important file to save is the .rda file so that R can read them when you use the package. We can use the usethis::use_this() function to correctly save it in the right place and as the right format. (Note: the {usethis} package is an amazing helper package for developing other R packages.)

x <- sample(1000)

# Saves both the object x and mtcars
usethis::use_data(x, mtcars)

More on this can be found at https://r-pkgs.org/data.html.

Basic package setup

A major resource that helped me all the way through and suggested some useful packages along the way can be found here.

It is a long read but it goes way more in-depth than I will.

I also used Hadley Wickham’s {babynames} package repository as a template for things I should look for when creating my own data package.

To start, here are some basic packages to install/load.

library(roxygen2)   # Documentation
library(devtools)   # Development
library(testthat)   # Testing
library(usethis)    # Test code

Create basic tests using the {testthat} package

Because this is a simple data package, there isn’t much testing required. However, in mirroring Hadley Wickham’s {babynames} R package, I added some tests to check if the data has changed since I last ran it.

Here is a little bit of code that I’ve used.

test_that("Pixar films head and tail", {
  expect_known_output(
    first_last(pixar_films),
    "test-data_pixar_films.txt",
    print = TRUE
  )
})

Here are a five notable points about the test above:

Use the test_that() function to create a test from the {testthat} package
The first quote parameter is the name of the test (here is it “Pixar films head and tail”)
The expect_known_output() function compares data to some file output
That file output is found in the same directory as your tests
The output file is a simple text file; here named as test_data_pixar_films.txt

pkgdown setup with GitHub Actions

GitHub Actions help automate testing and deployment of your website, conveniently all within GitHub. Here are some convenience functions to set them up.

# Automate deployment of your website
usethis::use_github_action("pkgdown")

# Automate testing your package
usethis::use_github_action_check_release()

This will setup GitHub to deploy your website to your gh-pages branch. After going to your repository settings, you can change it so that your website will host from there instead of your main branch.

Luckily, most of the configuration is done for you, but in case you are curious, I found GitHub Actions’ documentation helpful and clear on how to setup it up. The “Workflow syntax for GitHub Actions” section was a great reference.

For R specifically, you can find where all of these GitHub Actions are at https://github.com/r-lib/actions/tree/master/examples.

Create a hexsticker logo using the {hexSticker} package

I used the {hexSticker} package to help generate the logo. Take a look at the examples in their README to find common use cases. My use case was to use an external image. specifying a path to the image when you pass it into the sticker() function.

library(hexSticker)
library(showtext)

# Add Google Font
font_add_google("Cormorant Garamond", "garamond")
showtext_auto() # Use this font in all rendering

imgurl <- "man/figures/SeekPng.com_pixar-lamp-png_1678537.png"
sticker(
  imgurl,
  # Package settings
  package = "pixarfilms",
  p_size = 25,
  p_color = "#000000",
  p_family = "garamond",
  # Hexagon settings
  h_fill = "#89B9F7",
  h_color = "#000000",
  # Subplot or image settings
  s_x = 1,
  s_y = 0.75,
  s_width = 0.35,
  filename = "man/figures/logo.png"
)

I ran across the website TinyPNG, which can compress your images. This can be useful in keeping the size of your package small. Alternatively, you can opt to use the {tinieR} R package to do things all within R.

Finishing touches and submitting to CRAN

At this point, we can take a look at the “Release a package” section of the R packages book.

You can spell check your code.

# Performs spell check
devtools::spell_check()

# Creates word list for any words not standard, e.g., Pixar
usethis::use_spell_check()

As of this writing, there appears to be some bug when using rhub::check() function because of an error claiming there is no “utf8” package. A helpful hint that I found here says to run this instead.

# Using rhub
rhub::check(
  platform = "windows-x86_64-devel",
  env_vars = c(R_COMPILE_AND_INSTALL_PACKAGES = "always")
    )

# Or using devtools
devtools::check_rhub(
  platform = "windows-x86_64-devel",
  env_vars = c(R_COMPILE_AND_INSTALL_PACKAGES = "always"))

Once those are complete, you can then use the following to submit to CRAN.

devtools::release()

This will run automated checks and ask a series of questions making sure you’ve performed a number of checks like the rhub check. Afterward, it will automatically submit your package to CRAN.

In sum

Above are some notes to me and others on how I created my {pixarfilms} R package.

Here are useful resources I used and will refer back to are:

Setup GitHub Actions to validate repository links

2021-02-11T00:00:00-08:00

I think there’s a movement to move some continuous integration from Travis CI to GitHub Actions.

So here’s a post on how I converted one of my repositories, first by reviewing some of the GitHub interfaces and then creating it through the terminal.

So my repository awesome-nosql-guides has a tab labeled “Actions”.

Going there, you’re shown a screen talking about workflows here and there. If you haven’t set one of these up, this page should be mostly blank.

There is a handy workflow template that GitHub starts up for you if you click on “New Workflow”. Although it will be unaccessible for you, my new workflow link would look like this:

https://github.com/erictleung/awesome-nosql-guides/actions/new

There is also a Quickstart for GitHub Actions available.

But going through and creating this workflow, I found this page “Workflow syntax for GitHub Actions” the most useful. The documentation is very clear on what is what once you get used to reading YAML syntax.

From the terminal, you’ll need to create a folder called workflows/ within the .github/ directory. If you’re in the root of your project, you can run this.

# Create folders and parent directories as needed
mkdir -p .github/workflows

Within the workflows/ directory, this is where you’ll create your workflows. Essentially, this is where all your translated .travis.yml configurations will go.

Here’s an annotated GitHub Action I set up.

# Name of your workflow that GitHub displays
name: Check Resources

# Name of GitHub event that activates the workflow (required)
on: [push,pull_request]

# List of jobs to be run for workflow
jobs:
  # Name of job
  validate_links:
    name: Validate links  # optional
    runs-on: ubuntu-latest  # type of machine to run on
    steps:
      # These below are published Docker container images under `uses`
      - name: Checkout source files
        uses: actions/checkout@v2

      - name: Setup Ruby 2.6
        uses: ruby/setup-ruby@v1
        with:
          ruby-version: 2.6

      # You can also run your custom commands if no published action exists
      - name: Run checks on links
        run: |
          gem install awesome_bot
          awesome_bot --allow-ssl --allow 302,429 --allow-dupe -f README.md

Here is a list of operating systems you can place within jobs..runs-on option above.

You can take a look at more GitHub Actions on this Awesome-Actions page with a curated list of great things you can do with GitHub Actions. I hope to make use of this feature more in the future.

Reflecting on exploratory versus explanatory data visualization

2021-01-16T00:00:00-08:00

I still haven’t created examples for the #TidyTuesday project.

But in looking at other submissions and comparing them with some of the visualizations I was preparing to create, I had some real insight into the difference between exploratory and explanatory data visualizations as I reflected on why I liked certain examples more than others and my own.

Exploratory data analysis, as the name implies, is about exploring the data. These figures can be quite complex and show a lot of data.

I noticed this faceted plot example. It is a nice faceted plot and cannot be understood with one look. It took me some time to read the legend and scan back and forth across all the years to understand its meaning.

This is what makes this a good exploratory plot. It invites the viewer to explore and think about the work and data.

Although this is a complex exploratory plot, I think it is an exemplar for an exploratory plot, much like an infographic.

Here’s another good exploratory plot showing a network of the 300 most common transatlantic slave routes.

#TidyTuesday Week 25.
Network graph linking the 300 most common transatlantic slave routes. The routes are grouped according to random walks, highlighting some of the colonies of each nation. pic.twitter.com/QzH1EdHc02
— MissingNotAtRandom (@AtMissing) June 18, 2020

I really enjoyed this plot because of the various annotations scattered throughout the visual. These enhance the plot’s meaning and understanding.

On the other hand, there are explanatory plots.

These have more thought and purpose to what they wish to show.

For example, the linked plot below is comparing the number of paintings acquired from a prolific artist, Joseph Mallord William Turner, versus everyone else.

I went ultra-simple for #TidyTuesday, but a new thing for me was using the {#glue} 📦 which I love. Code on my GitHub https://t.co/eMRb3GP0G0. A visualisation about the Tate's favourite artist. pic.twitter.com/EfzRddNAbm
— Jack Davison (@JDavison_) January 15, 2021

These plots are typically not complex. The above plot is a standard histogram you learn in middle or high school. However, it is very effective in telling you a “story” or message.

To me, it shows

how prolific an artist Joseph Mallord William Turner was, and
how many paintings the Tate Art Museum has acquired.

These points are immediately clear.

I wrote this post because as I was creating my own visualizations for the #TidyTuesday project, I noticed how I didn’t feel as drawn to my examples as much as these others I found.

I then reflected on what kind of plot I was making and what insights or information I could learn from the plot. I realized I didn’t have a clear purpose in creating the plot other than to use a particular ggplot2 package, ggalt.

Although I may be overthinking it, this single exploration into the #TidyTuesday project has reminded me of what makes a good visualization. I hope to finally participate, share, and continue to learn from making more visualizations.

Side note, a great resource on exploratory data analysis can be found using NIST’s Engineering Statistics Handbook.

Speed up Anaconda load on WSL

2020-06-23T00:00:00-07:00

I use the Windows Subsystem for Linux on my work computer. Lately, the startup time for my Linux shell has taken too long for my taste and I set out to try and figure out why. I was able to figure out how to decrease my nearly 15 second wait (an eternity in programming) to nearly instantaneous. There is a slightly caveat to it but I don’t mind that extra inconvenience.

After a lot of searching around, I found out that my Anaconda/miniconda initialization was hogging all the time. This is because by default, I’ve set it up where conda activate base is called every time I create a shell. What the final solution does is remove this step and have you manually activate the environment whenever you need it.

Before I figured the eventual solution, I tried to blame the WSL shell itself. Looking around, I found there was an upgrade to WSL 2 available. This brought me to threads like this one.

One solution suggested to run

sudo apt update
sudo apt dist-upgrade

This gave me hope but it didn’t work. I eventually figured that because it was a work computer and I didn’t want to risk upgrading to a new system and everything breaking, I would abandon this potential solution.

This frustration then brought me to this thread. It sounds like I’m not the only one who has experienced this lag time. Even though the thread was from 2018, it seems relevant.

I gave their solutions a try. No luck.

The first thing I tried was change the absolute path to a relative one. I was skeptical this would work. And I was right in thinking so.

Scrolling down in the thread a bit more, I came across this comment. Near the bottom of the comment, it notes to comment out the code between # >>> conda initialize >>> and # <<< conda ini <<<. Then to just copy the inner if/else statements.

In my bash configuration (which should be somewhere either in .bashrc or .bash_profile), I have the following:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/leunge/miniconda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "~/miniconda/etc/profile.d/conda.sh" ]; then
        . "~/miniconda/etc/profile.d/conda.sh"
    else
        export PATH="~/miniconda/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

I commented most of that out and copied out that inner if block.

if [ -f "~/miniconda/etc/profile.d/conda.sh" ]; then
    . "~/miniconda/etc/profile.d/conda.sh"
else
    export PATH="~/miniconda/bin:$PATH"
fi

Previously, my shell configuration essentially ran conda activate base with every new shell. With this new setup, I am no longer in an activated environment.

To double check that this was the issue, I timed it.

$ time conda activate base

real    0m15.461s
user    0m3.188s
sys     0m11.516s

Yep. That was the issue.

But can I still access conda and all of my tools? It turns out if I need to be in an Anaconda environment, I’ll have to remember to run conda activate base before doing anything. The export statement in the above code block ensures I still have access to conda and my Anaconda instance of Python.

This is a minor inconvenience I’m willing to take for the sake of time.

$ time bash ~/.bash_profile

real    0m0.131s
user    0m0.016s
sys     0m0.078s