As part of our regular group meeting, we discussed R package building for reproducible and widely cited research. In my opinion, statistical theory and methods work often get used by more analysts and researchers when there is a public and freely available package. It will likely receive more critical feedback, but this is generally considered a good sign for useful data science products.

The slides are here, please feel free to send comments.

This is to be updated constantly as I collect tips adopted over the years. Check back later if you found them useful.


  • OSX (my version: macOS Catalina; Version 10.15 Beta (19A501i)) provides “tags” to color- and text- label and group folders and files. They are usually shown on the left side of the Finder window by default (unless you purposefully turned it off). The easy access by group significantly lowers the barrier to starting anything you hope to do (“the path of minimal resistance”). I usually use tags for grouping
    • reviews: set an orange color (with label review_to_do) and tag all the folders that contain the works to be reviewed. For example, the a folder 2019july03BIOM2019XXXM/ indicate the actual due date (2019 July 03) and the abbreviation for the journal (Biometrics). By clicking the tag review_to_do, all the folders to be reviewed are revealed at once, which can then be ordered by due dates to indicate urgency. Once the review for the work in a folder is done, unlabel the folder (and pat yourself on the back, move to next one).
    • current writing project: For me, this will most likely be a paper; create a tag paper_awesomeidea. I usually put the .tex, .bib and zwcommands.sty (my own definition of shortcuts) along with other necessary files (e.g., figs/, references) together. When I have the corresponding R codes in another folder, I will also tag that folder with the same color/label.
01 Nov 2015 by Zhenke Wu

Math Equations

We do plenty of math, so I’d like to test out MathJax support.

Here is an example of MathJax inline rendering — $ 1/x^{2} $. And here is a block rendering:

\[r_{XY} = \frac{\mathrm{cov}(X,Y)}{\sqrt{\mathrm{var}(X)\mathrm{var}(Y)}}\]

Now, if we’d like to get serious, we’d do something involving multiline aligned equations, like

\[\begin{align} \mathcal{N}(t, \mu, \sigma) &= \mathrm{normal} \newline &= \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{(t-\mu)^2}{2 \sigma^2}} \end{align}\]

or even an inline formula like $ \sum_{t=0}^{\infty} \frac{x^t}{t!} = e^x$.

Or we could try defining a command, like this. $ \newcommand{\water}{\mathrm{H}_{2}\mathrm{O}} $

Buffer slides off the sides of our tubes like \(\water\) off a duck’s back.

Or a more fancy set of equations:

\[\begin{align} \mbox{Union: } & A\cup B = \{x\mid x\in A \mbox{ or } x\in B\} \\ \mbox{Concatenation: } & A\circ B = \{xy\mid x\in A \mbox{ and } y\in B\} \\ \mbox{Star: } & A^\star = \{x_1x_2\ldots x_k \mid k\geq 0 \mbox{ and each } x_i\in A\} \\ \end{align}\]

Or to write the case likelihood function of PLCM model (Wu et al. 2015):

\[Pr(\boldsymbol{M}_i \mid I_i=1) = \sum_{\ell=1}^L\pi_\ell\theta_\ell^{M_{i\ell}}(1-\theta_\ell)^{1-M_{i\ell}}\prod_{j\neq \ell}\psi_j^{M_{ij}}(1-\psi_j)^{1-M_{ij}}\]

One can also use some doses of number theory…

{% include JB/video id="0Oazb7IWzbA" provider="youtube" %}