Here is the link to the guest lecture slides on “Using GenAI Tools in Statistical Research Workflow”. This is given to students at Michigan Biostatistics 620 Introduction to Health Data Science (Primary Instructor: Peter Song).

    The prompts used in the demos are available here.

    1. Quick search on command line?

    Want to quickly search in your directories? For example, you want to search for .R files that contain a string "latent", and print out the results. You may hope to do something like:

    • lookfor "latent" ".R"

    Or you want to just look for the files without specifying the file type by something like:

    • lookfor "latent"

    Or you want to print page-by-page using less

    • lookfor latent | less (the quotation marks around latent are not needed)

    If achieved, this could speed up your debugging process when you need to find certain variable names or keywords that you identified as useful.

    1.1. Back-story

    Here is an excellent solution I have been using (see more explanation at the end):

    This was motivated by Rafa Irizarry who was my then-Hopkins professor and taught me 1st year linear regression class running R and compling files in Emacs (I had not been exposed to Unix before PhD):

    As you can see by comparing the codes from the two tweets, I have used a different function rg which is faster, but both versions should work.

    • Too many outputs? You may also append "| less" when you use the command, e.g., lookfor latent | less, to print only page-by-page when lookfor identifies exceedingly many files or long files that contain the target string latent. However, by using less, the results lose syntax highlighting. This is why I have no | less by default to make it easier for my eyes when staring at the returned results.

    2. Set things up

    • Create the file at /full/path/to/your/file, e.g., I created a file ~/bin/lookfor containing the following bash code:
      • #!/bin/bash
        rg --no-heading -i "$1" --iglob \*$2	
        
    • Make the file excutable:
      • chmod +x /full/path/to/your/file
        
    • Create a symbolic link to the file by following this. The symlink may be in a different location /usr/local/bin/name_of_new_command. I created a symbolic link with the same name (does not have to be), lookfor.
      • sudo ln -s /full/path/to/your/file /usr/local/bin/name_of_new_command
        

    2.1. Example

    For example, because I created the lookfor file at ~/bin/lookfor, I needed to replace /full/path/to/your/file with ~/bin/lookfor; I wanted to type lookfor to execute the file, so I replaced name_of_new_command with lookfor.

    Note that we just created a symbolic in a different directory /usr/local/bin to the now executable file ~/bin/lookfor

    Now you may type lookfor with appropriate arguments to do the quick search, save your time, and be less cranky!

    3. More explanation (by GPT-4)

    #!/bin/bash
    rg --no-heading -i "$1" --iglob \*$2	
    

    This Bash script snippet utilizes rg, which is the command for Ripgrep, a fast search tool. The script performs a case-insensitive search for the pattern specified by $1 in files that match the glob pattern *$2.

    Here’s a breakdown of the options used:

    • --no-heading: This option tells Ripgrep to omit the file names and just show the matching lines from each file.

    • -i: This flag makes the search case-insensitive, meaning it will match both upper and lower case letters.

    • $1: This is a placeholder for the first argument passed to the script. It represents the search pattern.

    • --iglob \*$2: The --iglob option allows for case-insensitive file name matching. \* matches any number of any characters, and $2 is a placeholder for the second argument to the script, which specifies the file extension or pattern to search within.

    Overall, when you run this script with two arguments, it searches for the first argument (a text pattern) in all files that match the pattern given by the second argument (typically a file extension), without regard to case, and outputs the matching lines without file names.

    As part of our regular group meeting, we discussed R package building for reproducible and widely cited research. In my opinion, statistical theory and methods work often get used by more analysts and researchers when there is a public and freely available package. It will likely receive more critical feedback, but this is generally considered a good sign for useful data science products.

    The slides are here, please feel free to send comments.

    This is to be updated constantly as I collect tips adopted over the years. Check back later if you found them useful.

    Tags

    • OSX (my version: macOS Catalina; Version 10.15 Beta (19A501i)) provides “tags” to color- and text- label and group folders and files. They are usually shown on the left side of the Finder window by default (unless you purposefully turned it off). The easy access by group significantly lowers the barrier to starting anything you hope to do (“the path of minimal resistance”). I usually use tags for grouping
      • reviews: set an orange color (with label review_to_do) and tag all the folders that contain the works to be reviewed. For example, the a folder 2019july03BIOM2019XXXM/ indicate the actual due date (2019 July 03) and the abbreviation for the journal (Biometrics). By clicking the tag review_to_do, all the folders to be reviewed are revealed at once, which can then be ordered by due dates to indicate urgency. Once the review for the work in a folder is done, unlabel the folder (and pat yourself on the back, move to next one).
      • current writing project: For me, this will most likely be a paper; create a tag paper_awesomeidea. I usually put the .tex, .bib and zwcommands.sty (my own definition of shortcuts) along with other necessary files (e.g., figs/, references) together. When I have the corresponding R codes in another folder, I will also tag that folder with the same color/label.
    01 Nov 2015 by Zhenke Wu

    Math Equations

    We do plenty of math, so I’d like to test out MathJax support.

    Here is an example of MathJax inline rendering — $ 1/x^{2} $. And here is a block rendering:

    \[r_{XY} = \frac{\mathrm{cov}(X,Y)}{\sqrt{\mathrm{var}(X)\mathrm{var}(Y)}}\]

    Now, if we’d like to get serious, we’d do something involving multiline aligned equations, like

    \[\begin{align} \mathcal{N}(t, \mu, \sigma) &= \mathrm{normal} \newline &= \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{(t-\mu)^2}{2 \sigma^2}} \end{align}\]

    or even an inline formula like $ \sum_{t=0}^{\infty} \frac{x^t}{t!} = e^x$.

    Or we could try defining a command, like this. $ \newcommand{\water}{\mathrm{H}_{2}\mathrm{O}} $

    Buffer slides off the sides of our tubes like \(\water\) off a duck’s back.

    Or a more fancy set of equations:

    \[\begin{align} \mbox{Union: } & A\cup B = \{x\mid x\in A \mbox{ or } x\in B\} \\ \mbox{Concatenation: } & A\circ B = \{xy\mid x\in A \mbox{ and } y\in B\} \\ \mbox{Star: } & A^\star = \{x_1x_2\ldots x_k \mid k\geq 0 \mbox{ and each } x_i\in A\} \\ \end{align}\]

    Or to write the case likelihood function of PLCM model (Wu et al. 2015):

    \[Pr(\boldsymbol{M}_i \mid I_i=1) = \sum_{\ell=1}^L\pi_\ell\theta_\ell^{M_{i\ell}}(1-\theta_\ell)^{1-M_{i\ell}}\prod_{j\neq \ell}\psi_j^{M_{ij}}(1-\psi_j)^{1-M_{ij}}\]

    One can also use some doses of number theory…

    {% include JB/video id="0Oazb7IWzbA" provider="youtube" %}