Jekyll2023-12-11T22:12:36-08:00https://erictleung.com/feed.xmlEric LeungCode and Data LearningsEric T Leungleung@erictleung.comGet all dates for a day of the week2023-07-18T00:00:00-07:002023-07-18T00:00:00-07:00https://erictleung.com/get-dates-for-all-days<p>For some date-specific work, I wanted to get a list of all dates in the year, for a specific day of the week. Here is how I did this using R.</p>
<p>Using {lubridate}, the pseudocode is:</p>
<ul>
<li>get a list of all days in the year,</li>
<li>convert dates into day of the week, and</li>
<li>pull all of that date into a vector</li>
</ul>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Setup</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">lubridate</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="c1"># Get all days in the year</span><span class="w">
</span><span class="n">all_year</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ymd</span><span class="p">(</span><span class="m">20230101</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">days</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">364</span><span class="p">)</span><span class="w">
</span><span class="c1"># Get all Mondays</span><span class="w">
</span><span class="n">data.frame</span><span class="p">(</span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">all_year</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">mutate</span><span class="p">(</span><span class="n">dow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wday</span><span class="p">(</span><span class="n">date</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">filter</span><span class="p">(</span><span class="n">dow</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="c1"># Get all Sundays</span><span class="w">
</span><span class="n">pull</span><span class="p">(</span><span class="n">date</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> [1] "2023-01-01" "2023-01-08" "2023-01-15" "2023-01-22"
[5] "2023-01-29" "2023-02-05" "2023-02-12" "2023-02-19"
[9] "2023-02-26" "2023-03-05" "2023-03-12" "2023-03-19"
[13] "2023-03-26" "2023-04-02" "2023-04-09" "2023-04-16"
[17] "2023-04-23" "2023-04-30" "2023-05-07" "2023-05-14"
[21] "2023-05-21" "2023-05-28" "2023-06-04" "2023-06-11"
[25] "2023-06-18" "2023-06-25" "2023-07-02" "2023-07-09"
[29] "2023-07-16" "2023-07-23" "2023-07-30" "2023-08-06"
[33] "2023-08-13" "2023-08-20" "2023-08-27" "2023-09-03"
[37] "2023-09-10" "2023-09-17" "2023-09-24" "2023-10-01"
[41] "2023-10-08" "2023-10-15" "2023-10-22" "2023-10-29"
[45] "2023-11-05" "2023-11-12" "2023-11-19" "2023-11-26"
[49] "2023-12-03" "2023-12-10" "2023-12-17" "2023-12-24"
[53] "2023-12-31"
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">wday()</code> function defaults to Sunday being 1.
This can be changed by setting the <code class="language-plaintext highlighter-rouge">week_start</code> parameter to another day of the week.</p>
<p>For example, this is how you’d make the week start on Monday.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Start beginning of week on a Monday
wday("2023-07-18", week_start = 1) # This date is a Tuesday
# [1] 2
# Defaults to Sunday being beginning of the week
wday("2023-07-18")
# [1] 3
</code></pre></div></div>Eric T Leungleung@erictleung.comFor some date-specific work, I wanted to get a list of all dates in the year, for a specific day of the week. Here is how I did this using R.Updating your local branch after getting GitHub suggestions2023-06-19T00:00:00-07:002023-06-19T00:00:00-07:00https://erictleung.com/update-local-with-github-suggestion<p>GitHub has a useful feature to add code change suggestions right in the web UI.</p>
<p>This is great. But then what if you want to continue editing locally? Here is
some code to help you do that, starting from making the initial pull request.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git checkout -b new-branch
# ...Make changes
git add files-changed.txt
git commit -m "Changed files"
git push origin new-branch
# On GitHub some suggestions are made
git fetch
git checkout new-branch
git pull main new-branch
</code></pre></div></div>Eric T Leungleung@erictleung.comGitHub has a useful feature to add code change suggestions right in the web UI.How I created an RStudio addin, pyblack, to format Python code with black2023-06-06T00:00:00-07:002023-06-06T00:00:00-07:00https://erictleung.com/pyblack-rstudio-addin<p>I recently created a small (toy) project called
<a href="https://github.com/erictleung/pyblack">pyblack</a>. It helps format your Python
code in RStudio with the popular formatter,
<a href="https://github.com/psf/black">black</a>.</p>
<p>This started out with writing Python code in RStudio and wanting to format it,
specifically in RMarkdown and Quarto code chunks. With R, RStudio has a built-in
formatter, namely <a href="https://github.com/r-lib/styler/">{styler}</a>. I wanted a
similar tool for Python, so here is a little behind the scenes on how I did this.</p>
<p>I actually created another RStudio addin called
<a href="https://github.com/erictleung/unnestIfElse/">unnestIfElse</a> to help
automatically convert long nested <code class="language-plaintext highlighter-rouge">ifelse()</code> statements to a nicer
<code class="language-plaintext highlighter-rouge">dplyr::case_when()</code>.</p>
<p>I didn’t write my thoughts about it previously like I am with this addin, but
looking at my comments, I may have inspiration from
<a href="https://github.com/seasmith/AlignAssign">AlignAssign</a>. This addin aligns
assignment operators within a highlighted area.</p>
<p>Regardless, I have up to two places to draw code from that do what I want.
Namely, I want some code to help take code from some highlighted area and then
change it.</p>
<p>The first important function to learn about is the <code class="language-plaintext highlighter-rouge">getSourceEditorContext()</code>
function. It comes from the
<a href="https://rstudio.github.io/rstudioapi/">{rstudioapi}</a> R package<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> and
can extract highlighted text into an object.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">capture</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rstudioapi</span><span class="o">::</span><span class="n">getSourceEditorContext</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>
<p>This returns a nested list with, among other things, the selected text from an
editor. This is progress.</p>
<p>After some exploration, I found that I could get the correct text using
this<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">code</span><span class="w"> </span><span class="o"><-</span><span class="w">
</span><span class="n">capture</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">magrittr</span><span class="o">::</span><span class="n">extract2</span><span class="p">(</span><span class="s2">"selection"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">magrittr</span><span class="o">::</span><span class="n">extract2</span><span class="p">(</span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">magrittr</span><span class="o">::</span><span class="n">extract2</span><span class="p">(</span><span class="s2">"text"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>Next, I needed to figure out how to take this code and format it using black.</p>
<p>Using the <code class="language-plaintext highlighter-rouge">system2()</code> function, I can have R call system commands.</p>
<p>After some troubleshooting, I figured out how to also specify a <code class="language-plaintext highlighter-rouge">pyproject.toml</code>
file for black to reference when following custom user configuration.</p>
<p>So I finally did enough troubleshooting to translate this</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>black <span class="nt">-v</span> <span class="nt">--config</span> ~/path/to/pyproject.toml file_to_format.py
</code></pre></div></div>
<p>to this</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">system2</span><span class="p">(</span><span class="w">
</span><span class="s2">"black"</span><span class="p">,</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="w">
</span><span class="s2">"-v"</span><span class="p">,</span><span class="w">
</span><span class="s2">"--config ~/path/to/pyproject.toml"</span><span class="p">,</span><span class="w">
</span><span class="s2">"file_to_format.py"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>I added the <code class="language-plaintext highlighter-rouge">-v</code> for future troubleshooting ease<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<p>Now how do I get to <code class="language-plaintext highlighter-rouge">file_to_format.py</code>? I found another example of prettifying
code using <a href="https://github.com/stla/prettifyAddins">prettifyAddins</a>. At first
glance, this would have done the job. But this only apply black to Python files.
I wanted a way to format Python code chunks in RMarkdown.</p>
<p>But what I did get from this addin is the idea to write out the extracted code
to a temporary file to be formatted.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tmpFile</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tempfile</span><span class="p">(</span><span class="n">fileext</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">".py"</span><span class="p">)</span><span class="w">
</span><span class="n">writeLines</span><span class="p">(</span><span class="n">code</span><span class="p">,</span><span class="w"> </span><span class="n">tmpFile</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>I got some feedback that if there are lots of code blocks, there will be lots of
input/output writing that can cause things to slow down. Unfortunately, I
couldn’t find a way to cleanly stream code directly to black without dealing with
a long-troubleshooting-with-escaping-quotes headache<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>
<p>Now after styling with black, I can reinject the code using this code here.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">contents</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">style_black</span><span class="p">(</span><span class="n">code</span><span class="p">)</span><span class="w">
</span><span class="n">studioapi</span><span class="o">::</span><span class="n">modifyRange</span><span class="p">(</span><span class="w">
</span><span class="n">location</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">capture</span><span class="p">[[</span><span class="s2">"selection"</span><span class="p">]][[</span><span class="m">1</span><span class="p">]][[</span><span class="s2">"range"</span><span class="p">]],</span><span class="w">
</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">contents</span><span class="p">,</span><span class="w">
</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">capture</span><span class="p">[[</span><span class="s2">"id"</span><span class="p">]])</span><span class="w">
</span></code></pre></div></div>
<p>This pulls the location metadata from the initial source context when we
extracted the text from the editor.</p>
<p>All is well. My initial goal is done. But I got challenged to see if I could then
apply this formatting on all Python code chunks in an RMarkdown or Quarto
document.</p>
<p>Based on how I have been extracting code and replacing it, I expected a world of
hurt from a number of <code class="language-plaintext highlighter-rouge">for</code> loops and making sure I was tracking code positions
correctly<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.</p>
<p>Thanks to <a href="https://github.com/rossellhayes">Alex</a>, they gave me code similar to
the below that solves just this.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">document</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">parsermd</span><span class="o">::</span><span class="n">parse_rmd</span><span class="p">(</span><span class="n">file</span><span class="p">,</span><span class="w"> </span><span class="n">parse_yaml</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">document</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">purrr</span><span class="o">::</span><span class="n">modify_if</span><span class="p">(</span><span class="w">
</span><span class="n">document</span><span class="p">,</span><span class="w">
</span><span class="n">.p</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">inherits</span><span class="p">(</span><span class="n">chunk</span><span class="p">,</span><span class="w"> </span><span class="s2">"rmd_chunk"</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w">
</span><span class="n">identical</span><span class="p">(</span><span class="n">chunk</span><span class="o">$</span><span class="n">engine</span><span class="p">,</span><span class="w"> </span><span class="s2">"python"</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w">
</span><span class="c1"># Check whether code chunk explicitly says `black = FALSE`</span><span class="w">
</span><span class="n">ifelse</span><span class="p">(</span><span class="nf">is.null</span><span class="p">(</span><span class="n">chunk</span><span class="o">$</span><span class="n">options</span><span class="o">$</span><span class="n">black</span><span class="p">),</span><span class="w">
</span><span class="kc">TRUE</span><span class="p">,</span><span class="w">
</span><span class="nf">as.logical</span><span class="p">(</span><span class="n">chunk</span><span class="o">$</span><span class="n">options</span><span class="o">$</span><span class="n">black</span><span class="p">))</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="n">.f</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">chunk</span><span class="o">$</span><span class="n">code</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">style_black</span><span class="p">(</span><span class="n">chunk</span><span class="o">$</span><span class="n">code</span><span class="p">)</span><span class="w">
</span><span class="n">chunk</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="n">writeLines</span><span class="p">(</span><span class="n">parsermd</span><span class="o">::</span><span class="n">as_document</span><span class="p">(</span><span class="n">document</span><span class="p">),</span><span class="w"> </span><span class="n">file</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>I was mostly unfamiliar with the functions here, but ultimately, this makes use
of the <a href="https://github.com/rundel/parsermd">{parsermd}</a> R package. This package
parsed the Markdown-like document into an abstract-syntax tree (AST) to then be
manipulated programmatically<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>
<p>With that complete, I now have two ways to format Python code:</p>
<ol>
<li>Style selected code that I highlighted</li>
<li>Style all Python code blocks in an entire RMarkdown/Quarto document</li>
</ol>
<p>The last step is to then specify my functions in <code class="language-plaintext highlighter-rouge">inst/rstudio/addins.dcf</code> so
that RStudio knows these are addins like below.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Name: Style selection with black
Description: Style selected Python code with black
Binding: style_black_selection
Interactive: true
Name: Style active file with black
Description: Style active RMarkdown or Quarto Python code blocks with black
Binding: style_active_file_black
Interactive: true
</code></pre></div></div>
<p>In conclusion, I hope you’ve enjoyed learning a bit on how to programmatically
manipulate text in RStudio and now have a reference for if you too want to create
your own RStudio addin. Here is the project again if you want to take a look order
try it for yourself
<a href="https://github.com/erictleung/pyblack">https://github.com/erictleung/pyblack</a>.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>ICYMI Posit has an API to programmatically access RStudio! <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I like to use these convenient {magrittr} functions besides the <code class="language-plaintext highlighter-rouge">%>%</code> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>This returns more verbose stdout and stderr when formatting <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>This only works for simple examples like
<code class="language-plaintext highlighter-rouge">black --code "print ( 'hello, world' )"</code> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>This especially gets messy when injecting new code that will then change
the initial text positions. Sounds like some recursive programming that I don’t
want to get into. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>This package is magic. I want to learn what more I can do with this later. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Eric T Leungleung@erictleung.comI recently created a small (toy) project called pyblack. It helps format your Python code in RStudio with the popular formatter, black.How to create a custom 404 page on Jekyll2022-06-15T00:00:00-07:002022-06-15T00:00:00-07:00https://erictleung.com/create-custom-404<p>While on <a href="https://search.google.com">Google Search Console</a> for this site, I
found this error.</p>
<blockquote>
<p>Submitted URL seems to be a Soft 404</p>
</blockquote>
<p>I know I have a 404 page because I made it fun to have a message referencing
Winnie-the-Pooh saying, “Oh bother!” However, in making this site, I forgot one
piece that makes this an official 404 page and I’ll outline how to change that
below.</p>
<p>According to
<a href="https://docs.github.com/en/pages/getting-started-with-github-pages/creating-a-custom-404-page-for-your-github-pages-site">GitHub’s documentation on GitHub Pages</a>,
not only do you have to create a file named <code class="language-plaintext highlighter-rouge">404.md</code> or <code class="language-plaintext highlighter-rouge">404.html</code>, it needs
the following in the YAML front matter.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>---
permalink: /404.html
---
</code></pre></div></div>
<p>This should officially designate this page as the 404 page rather than a page
that coincidentally has the same URL.</p>
<p>After adding this, my original 404 page was automatically removed from
<a href="https://erictleung.com/sitemap.xml">my sitemap page</a>.</p>Eric T Leungleung@erictleung.comWhile on Google Search Console for this site, I found this error.Everything I googled in a week as a professional data scientist2022-04-21T00:00:00-07:002022-04-21T00:00:00-07:00https://erictleung.com/everything-i-googled-in-a-week-as-a-data-scientist<p>I ran across
<a href="https://localghost.dev/blog/everything-i-googled-in-a-week-as-a-professional-software-engineer/">this blog post from a software
engineer</a>
who decided to document what they googled in a week of work.</p>
<p>Their goal was to dispel the idea that “if you have to google stuff you’re not
a software engineer.” I wanted to do something similar, but from the
perspective of a data scientist.</p>
<p><em>Disclaimer</em>: although “data science” is such a broad field and my account
won’t be representative of all data workers out there, I thought it would be
data point for us to have to understand what could go on in our day-to-day.
This week apparently was full of package development with
<a href="https://pkgdown.r-lib.org/">{pkgdown}</a>, plotting results with
<a href="https://ggplot2.tidyverse.org/">{ggplot2}</a> and making small aesthetic changes,
and making a table with <a href="https://gt.rstudio.com/">{gt}</a>.</p>
<h2 id="monday">Monday</h2>
<p><code class="language-plaintext highlighter-rouge">pkgdown Failed to parse example for topic</code> - Turned out some code in an
function example was invalid</p>
<p><code class="language-plaintext highlighter-rouge">git ammend specific commit message</code> - Wanted to be more clear with a commit
message</p>
<p><code class="language-plaintext highlighter-rouge">pkgdown Topics missing from index</code> - A function was missing from my references
page so I just put it back in the <code class="language-plaintext highlighter-rouge">_pkgdown.yml</code> file and all was good</p>
<p><code class="language-plaintext highlighter-rouge">roxygen2 documentation</code> - Needed an overall page
<a href="https://roxygen2.r-lib.org/articles/rd-formatting.html">on roxygen2 syntax</a></p>
<h2 id="tuesday">Tuesday</h2>
<p><code class="language-plaintext highlighter-rouge">gt add table header</code> - Found the official website and just took a look at
<a href="https://gt.rstudio.com/articles/intro-creating-gt-tables.html">the introduction page</a></p>
<p><code class="language-plaintext highlighter-rouge">gt change header color</code> - Wanted to change the color, found <code class="language-plaintext highlighter-rouge">tab_options()</code>
and found
<a href="https://gt.rstudio.com/reference/tab_options.html">the parameter <code class="language-plaintext highlighter-rouge">column_labels.background.color</code></a>
to change the color</p>
<p><code class="language-plaintext highlighter-rouge">forcats relevel factors</code> - To have more control on how a plot is created, I
need extra control on my factors</p>
<p><code class="language-plaintext highlighter-rouge">forcats relevel by other variable</code> - Self-explanatory,
<a href="https://forcats.tidyverse.org/reference/fct_reorder.html">this page</a>
was useful</p>
<p><code class="language-plaintext highlighter-rouge">r get just file name of file path</code> - Stack Overflow to the rescue with
<code class="language-plaintext highlighter-rouge">basename()</code> and also <code class="language-plaintext highlighter-rouge">dirname()</code>
<a href="https://stackoverflow.com/a/2548871/6873133">here</a></p>
<p><code class="language-plaintext highlighter-rouge">gt left align columns</code> - Eventually got me to find
<a href="https://gt.rstudio.com/reference/cols_align.html">the <code class="language-plaintext highlighter-rouge">cols_align()</code> function</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 change order of legend</code> - Need to change order of the factor levels
with
<a href="https://www.geeksforgeeks.org/change-display-order-of-ggplot2-plot-legend-in-r/">help here</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 change order of stacked bar</code> - Again,
<a href="https://stackoverflow.com/a/33541763/6873133">factor reorder</a></p>
<p><code class="language-plaintext highlighter-rouge">r scales change axis to thousands</code> - This question was
<a href="https://stackoverflow.com/q/56758733/6873133">good enough</a>
because it led me to a comment about <code class="language-plaintext highlighter-rouge">unit_format()</code>, which brings me to the
next search…</p>
<p><code class="language-plaintext highlighter-rouge">r scales unit_format</code> - Which brings me to the official documentation page
and what I needed was
<a href="https://scales.r-lib.org/reference/unit_format.html">the <code class="language-plaintext highlighter-rouge">unit</code> and <code class="language-plaintext highlighter-rouge">scale</code> parameters</a></p>
<p><code class="language-plaintext highlighter-rouge">r ggplot2 add numbers to bar plot</code> - Needed <code class="language-plaintext highlighter-rouge">geom_text()</code> and passing in
<a href="https://stackoverflow.com/a/6645506/6873133">a <code class="language-plaintext highlighter-rouge">label</code> aesthetic</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 add two labels to bar plot</code> - I ended up back at
<a href="https://stackoverflow.com/a/6645506/6873133">my previous search</a>,
but figured because of the power of ggplot2, I can simply have two
<code class="language-plaintext highlighter-rouge">geom_text()</code> calls with two different aesthetic mappings, one to each kind of
label I wanted and adjust them accordingly to fix the plot</p>
<h2 id="wednesday">Wednesday</h2>
<p><code class="language-plaintext highlighter-rouge">ggplot2 stacked bar</code> - This
<a href="https://r-charts.com/part-whole/stacked-bar-chart-ggplot2/">site helped</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 legend on top</code> - Possible with <code class="language-plaintext highlighter-rouge">+ theme(legend.position = "top")</code></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 empty space</code> - I wanted to make an empty space between certain bars in
my bar plot, but I figured it might easier to make an empty space instead.
So…</p>
<p><code class="language-plaintext highlighter-rouge">forcats add factor</code> - Just
<a href="https://forcats.tidyverse.org/reference/fct_expand.html">the documentation page</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 format x-axis labels</code> - A
<a href="http://www.sthda.com/english/wiki/ggplot2-axis-ticks-a-guide-to-customize-tick-marks-and-labels">solid general resource</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 change ordering of legend</code> - I found
<a href="https://learnr.wordpress.com/2010/03/23/ggplot2-changing-the-default-order-of-legend-labels-and-stacking-of-data/">this site</a>
, but the answer seems outdated because it doesn’t work</p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 change labels with one function</code> - I kind of didn’t search for this
one exactly, rather, I used my Twitter to find the answer that uses
<a href="https://twitter.com/erictleung/status/1489060241933148160">the <code class="language-plaintext highlighter-rouge">labs()</code> function</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 color code geom_text</code> - You can simply pass in
<a href="https://stackoverflow.com/a/41544369/6873133">a color aesthetic and manually color it</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 change number of rows in legend</code> - I can
<a href="https://stackoverflow.com/a/44060041/6873133">use <code class="language-plaintext highlighter-rouge">guides(colour = guide_legend(nrow = 1)</code></a></p>
<p><code class="language-plaintext highlighter-rouge">gghighlight</code> - Didn’t end up using it, but still
<a href="https://cran.r-project.org/web/packages/gghighlight/vignettes/gghighlight.html">a useful package</a>
to know about</p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 format y-axis</code> - The {scales} package is absolutely wonderful, but I
keep on forgetting which
<a href="https://statisticsglobe.com/change-formatting-of-numbers-of-ggplot2-plot-axis-in-r">function to use</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 geom_col side by side bars</code> - I always forget
<a href="https://stackoverflow.com/a/25070645/6873133">the <code class="language-plaintext highlighter-rouge">position = "dodge"</code></a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 match geom text with dodged bars</code> - With
<a href="https://stackoverflow.com/a/6017961/6873133"><code class="language-plaintext highlighter-rouge">position_dodge()</code> within <code class="language-plaintext highlighter-rouge">geom_text()</code></a></p>
<h2 id="thursday">Thursday</h2>
<p><code class="language-plaintext highlighter-rouge">ggplot2 bar width</code> - Looks like a
<a href="https://stackoverflow.com/a/32943101/6873133">simple <code class="language-plaintext highlighter-rouge">width = X</code> in your <code class="language-plaintext highlighter-rouge">geom_bar()</code></a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 scales label_number</code> - Good documentation is
<a href="https://scales.r-lib.org/reference/number.html">the best</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 change text size</code> - Such
<a href="https://statisticsglobe.com/change-font-size-of-ggplot2-plot-in-r-axis-text-main-title-legend">a common
thing</a>
I’d imagine this would be easier. I was in a time crunch so maybe there’s a
better way for another time</p>
<p><code class="language-plaintext highlighter-rouge">?geom_vline</code> - I remembered this is to generate a vertical line, but I have
forgotten the parameters, so I ran this one right in RStudio</p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 add textbox</code> - Ah with
<a href="https://stackoverflow.com/a/44012702/6873133">the <code class="language-plaintext highlighter-rouge">annotate()</code> function</a></p>
<h2 id="friday">Friday</h2>
<p><code class="language-plaintext highlighter-rouge">ggplot2 better spacing of geom_text stacked bar plot</code> - This brought me to
learn about
<a href="https://stackoverflow.com/a/51134651/6873133">the <code class="language-plaintext highlighter-rouge">lineheight</code> paremter</a>,
but ultimately, I wanted the text <em>not to overlap</em>, and after looking at the
documentation, <code class="language-plaintext highlighter-rouge">geom_text</code> has a built-in parameter <code class="language-plaintext highlighter-rouge">check_overlap</code> for just
this.</p>
<p><code class="language-plaintext highlighter-rouge">ggrepl for stacked bar plot</code> - …But after using the solution above, I
realized that <code class="language-plaintext highlighter-rouge">check_overlap</code> actually <em>removes</em> text that overlaps, which I
didn’t want. I then found this post using <code class="language-plaintext highlighter-rouge">ggrepel</code>. I knew about this package
but wasn’t sure if it was useful for
<a href="https://stackoverflow.com/a/55817548/6873133">stacked bar plots</a>.
The example here kind of works,
except it changes the location of text I don’t want moving, like in the larger
bars. I abandoned this and simply removed “bars” with zero values.</p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 show all factors in legend</code> - Added a <code class="language-plaintext highlighter-rouge">drop = False</code> in there,
found <a href="https://stackoverflow.com/a/33765825/6873133">here</a></p>
<p><code class="language-plaintext highlighter-rouge">ggplot2 stacked bar plot position dodge with change in x</code> - I was frustrated
with where the text annotation for my columns were.
<a href="https://stackoverflow.com/a/58256551/6873133">This solution here</a>
didn’t exactly solve it outright for me, but it did show me what’s possible to
move around the column label. The parameter I was looking forward was simply
the <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> aesthetics, which allow me to fine tune where my text labels
are. In hindsight, this makes sense.</p>
<p>Guess I was wanting to be a bit more verbose on my thoughts on these
challenges. At this point, I was doing some very custom changes to my plots.</p>
<h2 id="reflection">Reflection</h2>
<p>A similar conclusion to the software engineering post I linked at the
beginning, being a data scientist will still need to search and look things up.
Regularly.</p>
<p>I’ve never really thought too much about what I’ve had to search for during my
job. This turned out to be a really fun exercise in mindfulness. Ideally, I
would keep track of these kinds of searches and then find ways to write helper
functions to do these things for me. Alas, a low priority for now. But a
possible side project idea.</p>
<p>Altogether, thank you Stack Overflow solutions, the whole ggplot2 system, and
the countless volunteers out there writing out their solutions on the web for
making my work possible.</p>Eric T Leungleung@erictleung.comI ran across this blog post from a software engineer who decided to document what they googled in a week of work.Git shallow clone for faster version control2022-01-06T00:00:00-08:002022-01-06T00:00:00-08:00https://erictleung.com/git-clone-depth<p>Contributing to open-source software is great fun. The feeling of being a part
of a larger community and adding to something larger than yourself. As a
consequence, you work on large projects with lots of version control history.</p>
<p>This post is a reminder to myself to use this <code class="language-plaintext highlighter-rouge">git clone</code> flag to make it
easier on my hard drive and make git work faster when doing day-to-day version
control commands.</p>
<p>The key flag is the <code class="language-plaintext highlighter-rouge">--depth</code> flag. According to the documentation, this flag
helps to</p>
<blockquote>
<p>Create a shallow clone with a history truncated to the specified number of
commits.</p>
</blockquote>
<p>The specified number of commits is an integer that comes after the <code class="language-plaintext highlighter-rouge">--depth</code>
flag.</p>
<p>For example, I worked on the
<a href="https://github.com/freeCodeCamp/freeCodeCamp">freeCodeCamp main repository</a>
and it has twenty-nine thousand commits as of this writing. This is a lot.</p>
<p>So to clone this repository without so many of those commits that I won’t need,
you can run this command.</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone <span class="nt">--depth</span> 100 https://github.com/freeCodeCamp/freeCodeCamp.git
</code></pre></div></div>
<p>This will get only the last 100 commits from this repository. More on this flag
here
<a href="https://book.git-scm.com/docs/git-clone">https://book.git-scm.com/docs/git-clone</a>.</p>Eric T Leungleung@erictleung.comContributing to open-source software is great fun. The feeling of being a part of a larger community and adding to something larger than yourself. As a consequence, you work on large projects with lots of version control history.On creating the pixarfilms R package2021-03-03T00:00:00-08:002021-03-03T00:00:00-08:00https://erictleung.com/on-creating-pixarfilms-r-package<p>I’ve never published an R package all the way to CRAN before. So I finally
decided it was time. So here, I will make brief notes of steps I took to
publish it to CRAN and some resources that helped me along the way.</p>
<p><em>Note</em>, this is a data-specific package, so the package development was light
on noting useful functions for an actual useful package.</p>
<h2 id="getting-the-data-using-the-rvest-package">Getting the data using the {rvest} package</h2>
<p>I like Pixar films and so I wanted to create a package to explore information
about these films.</p>
<p>The data I wanted to scrape was on Wikipedia
<a href="https://en.wikipedia.org/wiki/List_of_Pixar_films">here</a>.</p>
<p>The package that came to mind was to use
<a href="https://rvest.tidyverse.org/">{rvest} package</a>
to help me scrape the information.</p>
<p>I have also seen the {rvest} package used along with the
<a href="https://github.com/dmi3kno/polite">{polite} package</a>
to scrape data. But unfortunately, I had some issues using the {polite} package
(version 0.1.1) on my Windows machine where R couldn’t find a function.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Error in validate_key(key) : could not find function "validate_key"
</code></pre></div></div>
<p>So I abandoned using it. In the future, I will revisit this package and hope I
will be able to use it next time.</p>
<h2 id="saving-data-out-using-the-usethis-package">Saving data out using the {usethis} package</h2>
<p>To save out the CSV versions of these files, I wanted to automate how I write
out the files. So below, I wrote a simple function that will take the object
you want to save and save it out as a CSV file in the <code class="language-plaintext highlighter-rouge">data-raw/</code> directory.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cd">#' Save out for external use</span><span class="w">
</span><span class="cd">#'</span><span class="w">
</span><span class="cd">#' Write out a data frame to a CSV into the `data-raw/` directory with the same</span><span class="w">
</span><span class="cd">#' name as the data frame itself.</span><span class="w">
</span><span class="cd">#'</span><span class="w">
</span><span class="cd">#' @param x data.frame</span><span class="w">
</span><span class="cd">#'</span><span class="w">
</span><span class="cd">#' @example</span><span class="w">
</span><span class="cd">#' # Saves the mtcars dataset to the path `data-raw/mtcars.csv`</span><span class="w">
</span><span class="cd">#' save_data(mtcars)</span><span class="w">
</span><span class="n">save_data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="c1"># Notes on deparse() and substitute()</span><span class="w">
</span><span class="c1"># https://stackoverflow.com/a/14577878/6873133</span><span class="w">
</span><span class="n">str_path</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="n">deparse</span><span class="p">(</span><span class="nf">substitute</span><span class="p">(</span><span class="n">x</span><span class="p">)),</span><span class="w"> </span><span class="s2">".csv"</span><span class="p">)</span><span class="w">
</span><span class="n">write_csv</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">here</span><span class="p">(</span><span class="s2">"data-raw"</span><span class="p">,</span><span class="w"> </span><span class="n">str_path</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>These files are only used to keep a CSV record of the data.</p>
<p>The more important file to save is the <code class="language-plaintext highlighter-rouge">.rda</code> file so that R can read them when
you use the package. We can use the <code class="language-plaintext highlighter-rouge">usethis::use_this()</code> function to correctly
save it in the right place and as the right format. (<em>Note</em>: the {usethis}
package is an amazing helper package for developing other R packages.)</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="m">1000</span><span class="p">)</span><span class="w">
</span><span class="c1"># Saves both the object x and mtcars</span><span class="w">
</span><span class="n">usethis</span><span class="o">::</span><span class="n">use_data</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">mtcars</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>More on this can be found at
<a href="https://r-pkgs.org/data.html">https://r-pkgs.org/data.html</a>.</p>
<h2 id="basic-package-setup">Basic package setup</h2>
<p>A major resource that helped me all the way through and suggested some
useful packages along the way can be found
<a href="https://www.mzes.uni-mannheim.de/socialsciencedatalab/article/r-package/">here</a>.</p>
<p>It is a long read but it goes way more in-depth than I will.</p>
<p>I also used Hadley Wickham’s
<a href="https://github.com/hadley/babynames">{babynames} package repository</a>
as a template for things I should look for when creating my own data
package.</p>
<p>To start, here are some basic packages to install/load.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">roxygen2</span><span class="p">)</span><span class="w"> </span><span class="c1"># Documentation</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">devtools</span><span class="p">)</span><span class="w"> </span><span class="c1"># Development</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">testthat</span><span class="p">)</span><span class="w"> </span><span class="c1"># Testing</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">usethis</span><span class="p">)</span><span class="w"> </span><span class="c1"># Test code</span><span class="w">
</span></code></pre></div></div>
<h2 id="create-basic-tests-using-the-testthat-package">Create basic tests using the {testthat} package</h2>
<p>Because this is a simple data package, there isn’t much testing required.
However, in mirroring Hadley Wickham’s
<a href="https://github.com/hadley/babynames">{babynames} R package</a>,
I added some tests to check if the data has changed since I last ran it.</p>
<p>Here is a little bit of code that I’ve used.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">test_that</span><span class="p">(</span><span class="s2">"Pixar films head and tail"</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">expect_known_output</span><span class="p">(</span><span class="w">
</span><span class="n">first_last</span><span class="p">(</span><span class="n">pixar_films</span><span class="p">),</span><span class="w">
</span><span class="s2">"test-data_pixar_films.txt"</span><span class="p">,</span><span class="w">
</span><span class="n">print</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="p">})</span><span class="w">
</span></code></pre></div></div>
<p>Here are a five notable points about the test above:</p>
<ul>
<li>Use the <code class="language-plaintext highlighter-rouge">test_that()</code> function to create a test from the {testthat} package</li>
<li>The first quote parameter is the name of the test (here is it “Pixar films
head and tail”)</li>
<li>The <code class="language-plaintext highlighter-rouge">expect_known_output()</code> function compares data to some file output</li>
<li>That file output is found in the same directory as your tests</li>
<li>The output file is a simple text file; here named as
<code class="language-plaintext highlighter-rouge">test_data_pixar_films.txt</code></li>
</ul>
<h2 id="pkgdown-setup-with-github-actions">pkgdown setup with GitHub Actions</h2>
<p>GitHub Actions help automate testing and deployment of your website,
conveniently all within GitHub. Here are some convenience functions to set them
up.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Automate deployment of your website</span><span class="w">
</span><span class="n">usethis</span><span class="o">::</span><span class="n">use_github_action</span><span class="p">(</span><span class="s2">"pkgdown"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Automate testing your package</span><span class="w">
</span><span class="n">usethis</span><span class="o">::</span><span class="n">use_github_action_check_release</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>
<p>This will setup GitHub to deploy your website to your <code class="language-plaintext highlighter-rouge">gh-pages</code> branch. After
going to your repository settings, you can change it so that your website will
host from there instead of your <code class="language-plaintext highlighter-rouge">main</code> branch.</p>
<p>Luckily, most of the configuration is done for you, but in case you are curious,
I found
<a href="https://docs.github.com/en/actions">GitHub Actions’ documentation</a>
helpful and clear on how to setup it up. The
<a href="https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions">“Workflow syntax for GitHub Actions” section</a>
was a great reference.</p>
<p>For R specifically, you can find where all of these GitHub Actions are at
<a href="https://github.com/r-lib/actions/tree/master/examples">https://github.com/r-lib/actions/tree/master/examples</a>.</p>
<h2 id="create-a-hexsticker-logo-using-the-hexsticker-package">Create a hexsticker logo using the {hexSticker} package</h2>
<p>I used the
<a href="https://github.com/GuangchuangYu/hexSticker">{hexSticker} package</a>
to help generate the logo. Take a look at the examples in their README to find
common use cases. My use case was to use an external image.
specifying a path to the image when you pass it into the <code class="language-plaintext highlighter-rouge">sticker()</code> function.</p>
<pre><code class="language-{r}">library(hexSticker)
library(showtext)
# Add Google Font
font_add_google("Cormorant Garamond", "garamond")
showtext_auto() # Use this font in all rendering
imgurl <- "man/figures/SeekPng.com_pixar-lamp-png_1678537.png"
sticker(
imgurl,
# Package settings
package = "pixarfilms",
p_size = 25,
p_color = "#000000",
p_family = "garamond",
# Hexagon settings
h_fill = "#89B9F7",
h_color = "#000000",
# Subplot or image settings
s_x = 1,
s_y = 0.75,
s_width = 0.35,
filename = "man/figures/logo.png"
)
</code></pre>
<p>I ran across the website
<a href="http://tinypng.com/">TinyPNG</a>,
which can compress your images. This can be useful in keeping the size of your
package small. Alternatively, you can opt to use the
<a href="https://github.com/jmablog/tinieR">{tinieR} R package</a>
to do things all within R.</p>
<h2 id="finishing-touches-and-submitting-to-cran">Finishing touches and submitting to CRAN</h2>
<p>At this point, we can take a look at the
<a href="https://r-pkgs.org/release.html#release-submission">“Release a package” section</a>
of the R packages book.</p>
<p>You can spell check your code.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Performs spell check</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">spell_check</span><span class="p">()</span><span class="w">
</span><span class="c1"># Creates word list for any words not standard, e.g., Pixar</span><span class="w">
</span><span class="n">usethis</span><span class="o">::</span><span class="n">use_spell_check</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>
<p>As of this writing, there appears to be some bug when using <code class="language-plaintext highlighter-rouge">rhub::check()</code>
function because of an error claiming there is no “utf8” package. A helpful
hint that I found
<a href="https://github.com/r-hub/rhub/issues/374#issuecomment-629350910">here</a>
says to run this instead.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Using rhub</span><span class="w">
</span><span class="n">rhub</span><span class="o">::</span><span class="n">check</span><span class="p">(</span><span class="w">
</span><span class="n">platform</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"windows-x86_64-devel"</span><span class="p">,</span><span class="w">
</span><span class="n">env_vars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">R_COMPILE_AND_INSTALL_PACKAGES</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"always"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span><span class="c1"># Or using devtools</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">check_rhub</span><span class="p">(</span><span class="w">
</span><span class="n">platform</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"windows-x86_64-devel"</span><span class="p">,</span><span class="w">
</span><span class="n">env_vars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">R_COMPILE_AND_INSTALL_PACKAGES</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"always"</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>
<p>Once those are complete, you can then use the following to submit to CRAN.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">devtools</span><span class="o">::</span><span class="n">release</span><span class="p">()</span><span class="w">
</span></code></pre></div></div>
<p>This will run automated checks and ask a series of questions making sure you’ve
performed a number of checks like the rhub check. Afterward, it
will automatically submit your package to CRAN.</p>
<h2 id="in-sum">In sum</h2>
<p>Above are some notes to me and others on how I created my {pixarfilms} R
package.</p>
<p>Here are useful resources I used and will refer back to are:</p>
<ul>
<li><a href="https://www.mzes.uni-mannheim.de/socialsciencedatalab/article/r-package/">One page overview with great recommended packages</a></li>
<li><a href="https://r-pkgs.org/release.html">The R Packages book</a> for reference</li>
<li><a href="https://kalimu.github.io/post/checklist-for-r-package-submission-to-cran/">Dense one page checklist when updating your package</a></li>
</ul>Eric T Leungleung@erictleung.comI’ve never published an R package all the way to CRAN before. So I finally decided it was time. So here, I will make brief notes of steps I took to publish it to CRAN and some resources that helped me along the way.Setup GitHub Actions to validate repository links2021-02-11T00:00:00-08:002021-02-11T00:00:00-08:00https://erictleung.com/setup-github-actions-check-links<p>I think there’s a movement to move some continuous integration from
<a href="https://trends.google.com/trends/explore?date=today%205-y&geo=US&q=Travis%20CI,GitHub%20Actions">Travis CI to GitHub Actions</a>.</p>
<p>So here’s a post on how I converted one of my repositories, first by reviewing
some of the GitHub interfaces and then creating it through the terminal.</p>
<p>So my repository
<a href="https://github.com/erictleung/awesome-nosql-guides/">awesome-nosql-guides</a>
has a tab labeled
<a href="https://github.com/erictleung/awesome-nosql-guides/actions">“Actions”</a>.</p>
<p>Going there, you’re shown a screen talking about workflows here and there. If
you haven’t set one of these up, this page should be mostly blank.</p>
<p>There is a handy workflow template that GitHub starts up for you if you click
on “New Workflow”. Although it will be unaccessible for you, my new workflow
link would look like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://github.com/erictleung/awesome-nosql-guides/actions/new
</code></pre></div></div>
<p>There is also a
<a href="https://docs.github.com/en/actions/quickstart">Quickstart for GitHub Actions</a>
available.</p>
<p>But going through and creating this workflow, I found this page
<a href="https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions">“Workflow syntax for GitHub Actions”</a>
the most useful. The documentation is very clear on what is what once you get
used to reading <a href="https://yaml.org/">YAML syntax</a>.</p>
<p>From the terminal, you’ll need to create a folder called <code class="language-plaintext highlighter-rouge">workflows/</code> within
the <code class="language-plaintext highlighter-rouge">.github/</code> directory. If you’re in the root of your project, you can run
this.</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create folders and parent directories as needed</span>
<span class="nb">mkdir</span> <span class="nt">-p</span> .github/workflows
</code></pre></div></div>
<p>Within the <code class="language-plaintext highlighter-rouge">workflows/</code> directory, this is where you’ll create your workflows.
Essentially, this is where all your translated <code class="language-plaintext highlighter-rouge">.travis.yml</code> configurations
will go.</p>
<p>Here’s an annotated GitHub Action I set up.</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Name of your workflow that GitHub displays</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">Check Resources</span>
<span class="c1"># Name of GitHub event that activates the workflow (required)</span>
<span class="na">on</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">push</span><span class="pi">,</span><span class="nv">pull_request</span><span class="pi">]</span>
<span class="c1"># List of jobs to be run for workflow</span>
<span class="na">jobs</span><span class="pi">:</span>
<span class="c1"># Name of job</span>
<span class="na">validate_links</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">Validate links</span> <span class="c1"># optional</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span> <span class="c1"># type of machine to run on</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="c1"># These below are published Docker container images under `uses`</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Checkout source files</span>
<span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v2</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Setup Ruby </span><span class="m">2.6</span>
<span class="na">uses</span><span class="pi">:</span> <span class="s">ruby/setup-ruby@v1</span>
<span class="na">with</span><span class="pi">:</span>
<span class="na">ruby-version</span><span class="pi">:</span> <span class="m">2.6</span>
<span class="c1"># You can also run your custom commands if no published action exists</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Run checks on links</span>
<span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">gem install awesome_bot</span>
<span class="s">awesome_bot --allow-ssl --allow 302,429 --allow-dupe -f README.md</span>
</code></pre></div></div>
<p>Here is a list of
<a href="https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idruns-on">operating systems</a>
you can place within <code class="language-plaintext highlighter-rouge">jobs.<job_id>.runs-on</code> option above.</p>
<p>You can take a look at more GitHub Actions on this
<a href="https://github.com/sdras/awesome-actions">Awesome-Actions</a>
page with a curated list of great things you can do with GitHub Actions. I hope
to make use of this feature more in the future.</p>Eric T Leungleung@erictleung.comI think there’s a movement to move some continuous integration from Travis CI to GitHub Actions.Reflecting on exploratory versus explanatory data visualization2021-01-16T00:00:00-08:002021-01-16T00:00:00-08:00https://erictleung.com/exploratory-vs-explanatory-data-viz<p>I still haven’t created examples
<a href="https://github.com/rfordatascience/tidytuesday">for the <code class="language-plaintext highlighter-rouge">#TidyTuesday</code> project</a>.</p>
<p>But in looking at other submissions and comparing them with some of the
visualizations I was preparing to create, I had some real insight into the
difference between exploratory and explanatory data visualizations as I
reflected on why I liked certain examples more than others and my own.</p>
<p><a href="https://en.wikipedia.org/wiki/Exploratory_data_analysis">Exploratory data analysis</a>,
as the name implies, is about exploring the data. These figures can be quite
complex and show a lot of data.</p>
<p>I noticed this
<a href="https://github.com/charlie-gallagher/tidy-tuesday/blob/19ae39e9e0b3f9ba484c6a453fe1899e9b8ed2ee/art_collections/art_collection.png">faceted plot</a>
example. It is a nice faceted plot and cannot be understood with one look.
It took me some time to read the legend and scan back and forth across all the
years to understand its meaning.</p>
<p>This is what makes this a good exploratory plot. It invites the viewer to
explore and think about the work and data.</p>
<p>Although this is a <em>complex</em> exploratory plot, I think it is an exemplar for an
exploratory plot, much like an infographic.</p>
<p>Here’s another good exploratory plot showing a network of the 300 most common
transatlantic slave routes.</p>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/TidyTuesday?src=hash&ref_src=twsrc%5Etfw">#TidyTuesday</a> Week 25. <br />Network graph linking the 300 most common transatlantic slave routes. The routes are grouped according to random walks, highlighting some of the colonies of each nation. <a href="https://t.co/QzH1EdHc02">pic.twitter.com/QzH1EdHc02</a></p>— MissingNotAtRandom (@AtMissing) <a href="https://twitter.com/AtMissing/status/1273735843195297792?ref_src=twsrc%5Etfw">June 18, 2020</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>I really enjoyed this plot because of the various annotations scattered
throughout the visual. These enhance the plot’s meaning and understanding.</p>
<p>On the other hand, there are explanatory plots.</p>
<p>These have more thought and purpose to
<a href="https://www.storytellingwithdata.com/blog/2014/04/exploratory-vs-explanatory-analysis">what they wish to show</a>.</p>
<p>For example, the linked plot below is comparing the number of paintings
acquired from a prolific artist, Joseph Mallord William Turner, versus everyone
else.</p>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I went ultra-simple
for <a href="https://twitter.com/hashtag/TidyTuesday?src=hash&ref_src=twsrc%5Etfw">#TidyTuesday</a>, but a new thing for me was using the {<a href="https://twitter.com/hashtag/glue?src=hash&ref_src=twsrc%5Etfw">#glue</a>} 📦 which I love. Code on my GitHub <a href="https://t.co/eMRb3GP0G0">https://t.co/eMRb3GP0G0</a>. A visualisation about the Tate's favourite artist. <a href="https://t.co/EfzRddNAbm">pic.twitter.com/EfzRddNAbm</a></p>— Jack Davison (@JDavison_) <a href="https://twitter.com/JDavison_/status/1350038790392475649?ref_src=twsrc%5Etfw">January 15, 2021</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>These plots are typically not complex. The above plot is a standard histogram
you learn in middle or high school. However, it is very effective in telling
you a “story” or message.</p>
<p>To me, it shows</p>
<ul>
<li>how prolific an artist Joseph Mallord William Turner was, and</li>
<li>how many paintings the Tate Art Museum has acquired.</li>
</ul>
<p>These points are immediately clear.</p>
<p>I wrote this post because as I was creating my own visualizations for the
<code class="language-plaintext highlighter-rouge">#TidyTuesday</code> project, I noticed how I didn’t feel as drawn to my examples as
much as these others I found.</p>
<p>I then reflected on what kind of plot I was making and what insights or
information I could learn from the plot. I realized I didn’t have a clear
purpose in creating the plot other than to use a particular ggplot2 package,
<a href="https://www.r-bloggers.com/2016/04/ggplot2-exercising-with-ggalt-dumbbells/"><code class="language-plaintext highlighter-rouge">ggalt</code></a>.</p>
<p>Although I may be overthinking it, this single exploration into the
<code class="language-plaintext highlighter-rouge">#TidyTuesday</code> project has reminded me of what makes a good visualization. I
hope to finally participate, share, and continue to learn from making more
visualizations.</p>
<p>Side note, a great resource on exploratory data analysis can be found using
<a href="https://www.itl.nist.gov/div898/handbook/eda/eda.htm">NIST’s Engineering Statistics Handbook</a>.</p>Eric T Leungleung@erictleung.comI still haven’t created examples for the #TidyTuesday project.Speed up Anaconda load on WSL2020-06-23T00:00:00-07:002020-06-23T00:00:00-07:00https://erictleung.com/speed-up-anaconda-load-on-wsl<p>I use the
<a href="https://docs.microsoft.com/en-us/windows/wsl/install-win10">Windows Subsystem for Linux</a>
on my work computer. Lately, the startup time for my Linux shell has taken too
long for my taste and I set out to try and figure out why. I was able to figure
out how to decrease my nearly 15 second wait (an eternity in programming) to
nearly instantaneous. There is a slightly caveat to it but I don’t mind that
extra inconvenience.</p>
<p>After a lot of searching around, I found out that my Anaconda/miniconda
initialization was hogging all the time. This is because by default, I’ve set
it up where <code class="language-plaintext highlighter-rouge">conda activate base</code> is called every time I create a shell. What
the final solution does is remove this step and have you manually activate the
environment whenever you need it.</p>
<p>Before I figured the eventual solution, I tried to blame the WSL shell itself.
Looking around, I found there was an upgrade to WSL 2 available. This brought
me to threads like <a href="https://github.com/microsoft/WSL/issues/4737">this one</a>.</p>
<p>One
<a href="https://github.com/microsoft/WSL/issues/4737#issuecomment-565201243">solution</a>
suggested to run</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt update
<span class="nb">sudo </span>apt dist-upgrade
</code></pre></div></div>
<p>This gave me hope but it didn’t work. I eventually figured that because it was
a work computer and I didn’t want to risk upgrading to a new system and
everything breaking, I would abandon this potential solution.</p>
<p>This frustration then brought me to
<a href="https://github.com/ContinuumIO/anaconda-issues/issues/10173">this thread</a>.
It sounds like I’m not the only one who has experienced this lag time. Even
though the thread was from 2018, it seems relevant.</p>
<p>I gave their solutions a try. No luck.</p>
<p>The first thing I tried was
<a href="https://github.com/ContinuumIO/anaconda-issues/issues/10173#issuecomment-441386441">change the absolute path to a relative one</a>.
I was skeptical this would work. And I was right in thinking so.</p>
<p>Scrolling down in the thread a bit more, I came across
<a href="https://github.com/ContinuumIO/anaconda-issues/issues/10173#issuecomment-444243367">this comment</a>.
Near the bottom of the comment, it notes to comment out the code between <code class="language-plaintext highlighter-rouge"># >>>
conda initialize >>></code> and <code class="language-plaintext highlighter-rouge"># <<< conda ini <<<</code>. Then to just copy the inner
<code class="language-plaintext highlighter-rouge">if</code>/<code class="language-plaintext highlighter-rouge">else</code> statements.</p>
<p>In my bash configuration (which should be somewhere either in <code class="language-plaintext highlighter-rouge">.bashrc</code> or
<code class="language-plaintext highlighter-rouge">.bash_profile</code>), I have the following:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># >>> conda initialize >>></span>
<span class="c"># !! Contents within this block are managed by 'conda init' !!</span>
<span class="nv">__conda_setup</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span><span class="s1">'/home/leunge/miniconda/bin/conda'</span> <span class="s1">'shell.bash'</span> <span class="s1">'hook'</span> 2> /dev/null<span class="si">)</span><span class="s2">"</span>
<span class="k">if</span> <span class="o">[</span> <span class="nv">$?</span> <span class="nt">-eq</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then
</span><span class="nb">eval</span> <span class="s2">"</span><span class="nv">$__conda_setup</span><span class="s2">"</span>
<span class="k">else
if</span> <span class="o">[</span> <span class="nt">-f</span> <span class="s2">"~/miniconda/etc/profile.d/conda.sh"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
<span class="nb">.</span> <span class="s2">"~/miniconda/etc/profile.d/conda.sh"</span>
<span class="k">else
</span><span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="s2">"~/miniconda/bin:</span><span class="nv">$PATH</span><span class="s2">"</span>
<span class="k">fi
fi
</span><span class="nb">unset </span>__conda_setup
<span class="c"># <<< conda initialize <<<</span>
</code></pre></div></div>
<p>I commented most of that out and copied out that inner <code class="language-plaintext highlighter-rouge">if</code> block.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="o">[</span> <span class="nt">-f</span> <span class="s2">"~/miniconda/etc/profile.d/conda.sh"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
<span class="nb">.</span> <span class="s2">"~/miniconda/etc/profile.d/conda.sh"</span>
<span class="k">else
</span><span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="s2">"~/miniconda/bin:</span><span class="nv">$PATH</span><span class="s2">"</span>
<span class="k">fi</span>
</code></pre></div></div>
<p>Previously, my shell configuration essentially ran <code class="language-plaintext highlighter-rouge">conda activate base</code> with
every new shell. With this new setup, I am no longer in an activated
environment.</p>
<p>To double check that this was the issue, I timed it.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">time </span>conda activate base
real 0m15.461s
user 0m3.188s
sys 0m11.516s
</code></pre></div></div>
<p>Yep. That was the issue.</p>
<p>But can I still access <code class="language-plaintext highlighter-rouge">conda</code> and all of my tools? It turns out if I need to
be in an Anaconda environment, I’ll have to remember to run <code class="language-plaintext highlighter-rouge">conda activate
base</code> before doing anything. The <code class="language-plaintext highlighter-rouge">export</code> statement in the above code block
ensures I still have access to <code class="language-plaintext highlighter-rouge">conda</code> and my Anaconda instance of Python.</p>
<p>This is a minor inconvenience I’m willing to take for the sake of time.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">time </span>bash ~/.bash_profile
real 0m0.131s
user 0m0.016s
sys 0m0.078s
</code></pre></div></div>Eric T Leungleung@erictleung.comI use the Windows Subsystem for Linux on my work computer. Lately, the startup time for my Linux shell has taken too long for my taste and I set out to try and figure out why. I was able to figure out how to decrease my nearly 15 second wait (an eternity in programming) to nearly instantaneous. There is a slightly caveat to it but I don’t mind that extra inconvenience.