Strengthening the R ecosystem

Link to video

Speakers:

20+ Years of Reading Data into R

Session description

For the last 20+ years, I’ve been reading data into R. It all started with the humble scan() function. Then, I used fancy new-fangled file formats, such as parquet and arrow, before progressing onto trendy databases, such as duckdb, for analytics. Besides the fun you can have by messing around with new technologies, when should you consider the above formats? In this talk, I’ll cover a variety of methods for importing data and highlight the good, the bad, and the annoying.

Session notes

Essence of talk: csv just works and it will probably work for a long time out in the future - don’t overcomplicate file formats when small sizes of data. For big data: Use parquet from arrow or nanoparquet.

Parquet

  • Uses metadata to save data smartly. Fx. c(4,4,4,4,4,1,2,2,2,2) parquet reads as “5 times 4, 1 times 1, 4 times 2”, in addition many more features
  • Size of parquet files usually bit smaller than RDS, but biggest gain is in read/write speed
  • arrow/nanoparquet tries as much as possible to do calculations on file rather than in memory

DuckDB

Load data (in csv/parquet/whatever) into DuckDB (think of it as a cache) and use dplyr/duckplyr for fast queries.

Contributing to the R Project

Slides: https://github.com/hturner/positconf2024

Sessions description

Posit provides an amazing set of products to support data science, and we will learn about many great packages and approaches from both Posit and the wider community at posit::conf(2024). But underlying it all are a number of open source tools, notably R and Python. How can we contribute to sustaining these open source projects, so that we can continue to use and build on them?

In this talk I will address this question in the context of the R project. I will give an overview of the ways we can contribute as individuals or companies/organizations, both financially and in kind. Together we can build a more sustainable future for R!

Session notes

See the R contributor site to get information on how to contribute to the R project.

Contribution oppotunities

Little experience:

A bit more experience:

Bug fixer in R:

Working groups

What I Learned Resurrecting an R package

Session description

We hear a lot about creating R packages, but R packages don’t last forever on their own. I describe my experience resurrecting rvertnet, an abandoned ropensci project that had become stale on CRAN. I talk about how I found out the package needed a new maintainer, how I took ownership of the package, and how I decided what needed fixing. I discuss several examples of package repairs I implemented, including fixing outdated CI, removing unnecessary files and dependencies, writing workarounds for deprecated functions, and fixing building of a vignette. Finally, I’ll describe my positive experiences communicating with the old maintainer and submitting a package to CRAN for the first time.

Session notes

  • Contact the old maintainer to hear status and potentially suggest taking over the maintainer role
  • Check through history in documentation, NEWS, issues, PRs, etc.
  • Take a long time to just understand the code
  • First goal: Get to minimal working state

Developing the package

Building Sustainable Open Source Ecosystems: Lessons From the #rstats Community and an NSF Grant

Session description

The blessing and the curse of open-source software is that it lacks the infrastructure of a corporation. It can often be difficult to ensure that projects have stability and longevity. In this talk, I will discuss ongoing work on an NSF “Pathways in Open-Source Ecosystems” grant focused on the {data.table} package. Like many R packages, {data.table} has incredible functionality and thousands of users - but no cohesive community or governance structure to support it long-term. We are working to build this ecosystem. I will provide my advice and insight for key aspects of a sustainable open-source project: Engaging casual users, supporting developers, generating content, emphasizing education, and creating a home base for the community.

Session notes

Essence of talk: Don’t be afraid to contribute - make issues on GitHub, write blogs about the work you do, etc.

Suggests watching The unreasonable effectiveness of public work - David Robinson - Rstudio conference 2019

Back to top