FOSS4Spectroscopy Update

FOSS
What’s changed in a year?
Author

Bryan Hanson

Published

August 15, 2023

Yesterday I pushed a major update to the FOSS for Spectroscopy web site. Remember that this is a lightly curated and imperfect process; I have some scripts that automate the discovery of packages, but there is still a considerable amount of manual inspection and decision making. If you think I’ve missed a package, please let me know.

It’s been nearly a year, and there are a number of new entries. Let’s do a quick comparison of the results from November 2022 versus August 2023. Back in November 2022 there were 246 packages; nearly a year later there are 287. Figure 1 shows a Venn diagram of the changes.

Figure 1: Venn diagram comparing the two sets of packages

Package Language

Software development in spectroscopy is clearly actively occurring in the Python ecosystem; R has stalled (see Table 1). Interpretation of this observation is challenging. A few thoughts:

  • One could claim that the R ecosystem for spectroscopy is mature and further development is naturally going to be limited.
  • The growing popularity of the Python language surely contributes significantly.
  • One motivation for people to write packages is to learn the language and the package delivery system. There’s nothing wrong with these motivations, however this leads to packages that largely overlap in their features.
Table 1: Package language, 2022 vs 2023.
language Nov 2022 Aug 2023
Python 162 198
R 60 61
C++ 4 5
Java 4 4
Julia 4 5
C 2 2
Qt 2 2
C-shell 1 1
C# 1 2
Fortran 1 1
Go 1 1
html 1 1
JavaScript 1 2
TypeScript 1 1
XML 1 1

Package Focus

Table 2 shows the change in package focus. Most categories grew modestly.

Table 2: Package focus, 2022 vs 2023.
category Nov 2022 Aug 2023
Any 32 34
Data Sharing 33 41
EEM 3 3
EPR, ESR 5 7
IR (all flavors) 35 38
Raman 28 34
UV-Vis, UV, Vis 19 20
LIBS 3 5
Muon 1 0
PES 1 2
XRF, XAS 10 15
NMR 87 97
Time Series 3 3

Personal Perspective

I’ve curated this site for several years now. One thing that is clear is that there is a lot of duplication of effort and features. I mentioned above a few reasons for this, but at some point it makes more sense to add to an existing package than to write one from scratch. However, this can only happen if people look around for existing software first. That of course is one purpose of the FOSS for Spectroscopy web site.

As I look at it,

  • One-dimensional spectroscopic techniques produce collections of x,y data, usually spectra1, and can thus be stored in a matrix. In terms of organization there’s nothing different between an IR spectrum and a UV-Vis spectrum.
  • Two-dimensional techniques produce data that can be stored in one of two ways:
    • One spectrum (or one wavelength) can be stored as matrix, so a set of spectra is a stack of matrices (termed an array in some languages). Think of 2D NMR spectra: one element of the stack is a single 2D spectrum.
    • Alternatively, individual spectra can be stored in a matrix and an additional data structure provides a key to how each spectrum relates to the others. Think of a Raman image: spectra are collected over a set of x,y locations.

This design decision is the core of building a package. Once you have decided on a structure:

  • You need import methods, these are always tedious to write.
    • Broadly accepted formats, like JCAMP-DX or plain old csv.
    • Manufacturer specific formats, some of which may be poorly documented.
  • You need processing methods.
    • Widely used methods, like normalization and smoothing.
    • Technique specific methods, such as zero-filling.
  • You need analysis methods.
    • Common techniques like PCA.
    • Analysis unique to a specific technique.
  • You need visualization methods.

In an ideal world, a data storage structure is chosen and everything else can be built later, quickly at first and then more slowly. The reality however is that people keep reinventing most of the wheel. I suppose this is not too different from people inventing entirely new computer languages…

Footnotes

  1. I say “usually spectra” because for some instruments, depending upon the goal of the package, one may store raw data that must be transformed in a separate step. The best example is raw time-domain NMR data which must be Fourier transformed into frequency-domain spectra before analysis.↩︎

Reuse

Citation

BibTeX citation:
@online{hanson2023,
  author = {Hanson, Bryan},
  title = {FOSS4Spectroscopy {Update}},
  date = {2023-08-15},
  url = {http://chemospec.org/posts/2023-08-15-F4S-Update/F4S-Update.html},
  langid = {en}
}
For attribution, please cite this work as:
Hanson, Bryan. 2023. “FOSS4Spectroscopy Update.” August 15, 2023. http://chemospec.org/posts/2023-08-15-F4S-Update/F4S-Update.html.