UseR! 2017 Recap
I recently returned from a week at the UseR! 2017 conference in Brussels, which was a great opportunity to catch up on the latest trends in the R world. This conference was noticeably different from the 2015 Aalborg conference in the demographics of the audience; in prior conferences, the attendees were overwhelmingly either PhD faculty or PhD candidates but at this conference many if not the majority were consultants and practitioners from industry. There is a lot to cover, so I’ll split things into a few categories:
- Natural Language Processing
- A Tidal Wave of Mapping
- Shiny Stuff
- Docker Was Common
- Mixed Integer Programming
- Parallel Processing
- Making Web Sites Accessible to the Blind
At each of the UseR! conferences, you can look back and see a trend in the talks submitted; here are the trends from the conferences that I have attended:
- Ames, 2007–this was really the year of critical mass. Everyone talked about this being the T-shirt to save to prove that you had attended. Sweave and ODFweave were big topics, but did not dominate discussion as they would in 2011. The difficulty of managing package dependency and support was a major discussion point; most packages were developed in academia, but faculty had (and have) no incentive for ongoing support. There were big pharma and finance contingents at this conference.
- Dortmund, 2008–I think infrastructure, computational speed and parallelism were the major topics. This was the year that I remember people first talking about alternative interpreters and the possible routes (and potential funding) for converting the interpreter from 32 to 64 bits.
- Rennes, 2009–this was the year of beginning of the conversion from plot to ggplot2. There was a lot of discussion about lattice as well, but much more about ggplot2.
- Warwick, 2011–reproducible research and integrated development environments (IDEs) were the major topics here. RStudio, other IDEs, sweave and knitr were big topics. The pharma crowd generally did not make it to this conference, as most were going to the BioC conference by this point.
- Albacete, 2013–dplyr was introduced. The conversion to 64-bit and release of 3.0 were the major topics, along with parallelism in graphics cards (cuda). Improvements to CRAN and package dependency management were big discussion points. This was the first year that the finance crowd largely did not show up, as most were going to the RFinance conference at this point.
- Aalborg, 2015–the Microsoft acquisition of Revolution Analytics was huge news, as was all manner of interactive visualizations. Alternative interpreters were a major point of discussion at this conference. Docker was discussed in several settings.
At Brussels, 2017, there was surprisingly little discussion of alternative interpreters, static and interactive graphics and IDEs. The three themes that I could pick out were
- Natural language processing (NLP)
- Extensions to Shiny
The next sections will talk about these major themes and some individual lectures that I think should get outsized attention.
Natural Language Processing
In this year’s tutorial lists, there was a half-day tutorial on natural language processing of text, and four-presentation session on NLP-related topics. It is clear that analysis of text is beginning to enter the main stream. There were two major packages discussed:
coreNLPby Taylor Arnold and Lauren Tilton. These were the primary packages discussed in the Tuesday afternoon tutorial led by the package authors, both of the University of Richmond. They are also the co-authors of a book, Humanities Data in R. Tilton is historian who uses text analysis to look at authorship and other topics of interest in analyzing historical documents. Arnold is a statistician whose work is primarily algorithmic. These packages are quite robust, but take some work to learn to use effectively.
tidytextby Julia Silge, the co-author of Tidy Textmining in R. Silge is a data scientist at Stack Overflow. The tidytext package appears to be easier to use but somewhat less capable for complex analysis.
It is clear that text mining is now a mainstream application.
A Tidal Wave of Mapping
Mapping is another application that has improved to the point that it is mainstream, as there were two mapping-related tutorials and several presentations. Much of the discussion was on the migration of mapping tools from the older
sp to newer
sf (tidy) spatial object types. If writing new code, you definitely want to use
sf object types wherever possible.
Shiny has been around for several years now, but some people have stayed away from it due to the high cost of the Enterprise version with encryption and authentication. It is much easier to solve those problems with some of the new packages presented at the conference, but especially
Secure the Open Source Version of Shiny–ShinyProxy
ShinyProxy is a new package that replaces the proxy server that is internal to Shiny. It allows you to run a cluster of Shiny servers and to implement SSL encryption without getting the enterprise version of Shiny. This will make Shiny implementations much, much easier.
Speed up Database Connections in Shiny–pools
In large Shiny implementations, the cost of opening and closing database connections can become a big performance problem. The
pools package implements connection pooling–something used in transaction processing systems for years–in Shiny. If you are doing a large Shiny implementation, this is an important new tool.
Docker Was Common
In Albacete (2013) there was a lot of discussion about the use of R in finance applications and the problems of reproducibility of calculations for regulatory compliance. No one had a clean solution, and the best solution appeared to be the use of Virtual Machines. In Aalborg (2015), there were several presentations that discussed Docker as a way to help with the configuration management problems common to regulatory compliance. In Brussels, Docker was a theme underlying numerous presentations. This is clearly part of the main stream skill set at this point.
The tidy/tidyverse approach to data structures has become the defacto standard, as was clear in numerous presentations.
Mixed Integer Programming
There was a single presentation on a pair of new packages called ompr and ompr.roi by Dirk Shumacher that implement easy to use model definition functions for mixed integer linear programs. This session was sparsely attended, but the people who were there–and who stayed to talk to the author–were key players in the R world. This package is a huge deal for people (like me) who come from an operations research background, and it will make a number of statistical and analysis methods much easier to implement.
Parallel and Cloud
Although not a development that will change the content of next year’s UseR! conference, the
doAzureParallel CRAN package will make parallel processing at cloud-scale much easier.
Making Web Sites Accessible to the Blind
One of the most enlightening presentations that I attended was one by Jonathan Godfrey on “Interactive Graphs for Blind and Print Disabled People,” one of the more enlightening presentations that I attended. The short story is that the screen readers used by blind data scientists cannot do much with
.png and other bit-map formats, but can read and translate A LOT from
.svg format files. The
.svg files that are created with
ggsave are not useful for screen readers, but files created with
gridSVG can be interpreted by screen readers.
As a result of this lecture, I am changing my workflow.
The UseR! 2017 did not disappoint.