NMRduino is maturing rapidly! If what I’m doing is at all interesting to you, and you don’t know about the NMRduino Project and their recent publication, be sure to check it out. It’s much more sophisticated than what’s going on here, and will be available soon.
In the previous post on biwise operators in C I detailed some of the machinations needed to control the ADC on an Arduino. After considerable work, that knowledge has been put to use to develop a working receiver system (though more work will be needed to perfect it). In the process, I have finetuned the code needed to control the instrument and collect the data in a useable form.
The Bnmr software is available on Github. The hardware used for testing and development is shown in Figure 1.
Once the proper bits were set so the ADC would collect data, I first used a simple voltage divider to generate a constant ADC signal, adjustable via a potentiometer (if one doesn’t provide some kind of input, the ADC output drifts around). With a signal available, there were many rounds of code revision so that a specified number of data points could be collected and stored somewhere.^{1}
In terms of storage, there were issues. The Arduino has very little actual memory, so the amount of data that can be “stored” is very small. As a result, this data has to be quickly moved somewhere else with significant memory. The solution to the transient data storage is a ring buffer. I was able to implement the code found on Wikipedia in C
without too much trouble. The idea behind a ring or circular buffer is that data is stored in a fixed size buffer, and added and removed in a coordinated manner via indices. However, in the big picture data must be removed from the ring buffer as fast or faster than it is put in, otherwise data is overwritten. And, it turns out that the Arduino ADC can really pump out data. In order to keep the ring buffer from filling and overwriting (which is treated as an error), I had to collect data from the ADC at a lower rate that it can produce numbers, for instance every 10th reading.^{2}
The second problem was what to do with the data that was emptied out of the ring buffer. I spent a lot of time trying to send it to the serial port, so I could capture it from there. However, Bnmr
also sends a lot of messages about various events to the serial port. These messages inform the user about what is happening and also provide troubleshooting guidance. Ultimately, it was not possible to capture the data this way – the messages invariably introduced problems with the formatting of the data. The solution was to add a micro SD card breakout board to store the data on the fly, effectively separating the message stream from the data stream. Before I settled on that approach, I also tried to use R
to both send messages and capture the data.^{3} In addition, I also tried using a shell script and a terminal emulator to do the same. Neither was completely successful when messages and data were mixed. However, the shell script experience proved helpful in developing the final, successful approach. Another problem with having both messages and data in the same serial stream was that MacOS has a nasty habit of reseting high baud rates desirable for data collection back to lower rates. This is discussed in various forums and workarounds exist, but I could not get the overall process to be reliable and robust.
With functioning software and a method to control the overall acquistion process in hand, I used the PicoScope to generate a sine wave (Figure 2). Bnmr
was compiled and uploaded via a shell script calling the arduinocli
(included in the repo, see Listing 1). Control was then transferred to picocom
which is a terminal emulation program, and the start signal sent to the Arduino. Once the scans completed, the micro SD card was moved from the Arduino to a dongle connected to the laptop, and analyzed using R
as shown later.
Whatever is typed in the picocom
terminal/window is sent to the serial port and then to the Arduino. All messages sent by the Arduino are echoed in the picocom
window and saved to a message log file. A typical output is in Listing 2.
The data log file is a commaseparated file with an entire FID/scan on one long line. There is a blank line between each data line. This is stored on the micro SD card in a file whose name is provided by the user in user_input.h
(this is where all user modifiable parameters are given). We can read in the first two scans and plot the early points as follows (Figure 3).
dat < readLines("FID_CSV")
res1 < as.numeric(unlist(strsplit(dat[1], ", ")))
# skipping dat[2] as it is a blank line
res2 < as.numeric(unlist(strsplit(dat[3], ", ")))
plot(x = 1:length(res1), y = res1, type = "b", xlim = c(1, 25),
xlab = "Index", ylab = "ADC Reading")
lines(x = 1:length(res1), y = res2, col = "red")
Since there is no coordination (i.e. no common time base) between the generated signal and the ADC data collection, the two sample scans are offset slightly. A common time base is very important for an NMR, so this will be one of the next items for focus.
Right now, a fixed number of data points are collected in whatever time it takes. This needs to be modified so that the data points are collected over a fixed amount of time, specified by the user.↩︎
There’s a potential problem here, and that is one must collect enough points to satisfy Nyquist’s criterion in order to faithfully represent a sine wave. Preliminary experiments suggest that there are plenty of data points because the ADC is extremely fast.↩︎
I spent considerable time writing an R
package which I named UtiliDuino for this purpose, but ultimately it was not the best solution.↩︎
@online{hanson2024,
author = {Hanson, Bryan},
title = {EFNMR {Part} 3: {Receiver} {Software}},
date = {20240416},
url = {http://chemospec.org/posts/20240417EFNMRBuild3/EFNMRBuild3.html},
langid = {en}
}
C
The definitions of the bitwise C
operators can be found in numerous places, stated with various levels of clarity and understandability. Sometimes the definitions are very terse and seemingly quite clear, but after reading, one simply doesn’t know how to use it. The revered text known as “K & R” doesn’t even devote much space to them, though that may be because microcontrollers were a relatively new thing at the time of Kernighan and Ritchie (1988). The Arduino reference documents give quite a bit more detail but don’t have the complexity seen in the wild.
The following gives my own interpretation and understanding of the individual operators. To be clear, these definitions don’t really give a sense of why they might be useful or how one would use them.
A key thing to note is that these operators compare two bits (which are either 0 or 1) and returns an updated bit. The exceptions are:
The reality is that one rarely sees these operators used on a single bit, even NOT. More often, one sees them applied to a byte, a set of 8 bits residing contiguously in memory. Those bytes, at least in the current use, turn out to be registers on the Arduino, our next topic.
The ATmega328P microcontroller used in the Arduino Uno has several registers that control the ADC:^{2}
We’ll use ADCSRA as our example. ADCSRA is of course an acronym. If you look at the iom238p.h file where these things are defined, you find that ADCSRA is an alias for a specific memory address(Figure 1).^{3} It is the address of the first bit of a single byte, composed of 8 bits, numbered 07. In the datasheet we can see what is stored in this register. Each of the individual bits has a name, for instance ADEN, which stands for “ADc ENable”, and in the header file, the name ADEN is aliased to bit 7 (Figure 2). So we have an 8 bit memory address with a name and each bit has its own name to make remembering their roles easier. These are the bits we need to control with the bitwise operators in order to configure the ADC.
As I hinted at earlier, what people actually write is rather different from the simple definitions seen in the reference documents (or my version above). So let’s explore these wildtype examples in detail.
One simple example often seen doesn’t even use the bitwise operators.
ADCSRA = 0;
In this case, the righthandside (RHS) 0
is interpreted as an 8 bit binary number, 0000 0000
and this sets all 8 bits to zero at once. This incantation is probably most appropriate to reset the entire register, as all zeros is the default setting for this particular register (though not necessarily other registers).
If you know the value for every bit you want to set, and want to set them all at once, you can use a binary literal:
ADCSRA = B00101010; // prefix binary number with B, or
ADCSRA = 0b00101010; // prefix binary number with 0b
The downside here is that future readers of your code have to look up the details of a register’s bit settings everytime they look at your code. Other methods discussed here use aliases for particular bits (e.g. ADEN
) which provide at least some mnemonic assistance. Binary literals are only supported in more recent versions of C
but you are likely to be using such a version.
ADCSRA = (1 << ADEN);
In this incantation there are several interesting things going on. Let’s unpack it starting from the RHS. We see this expression: (1 << ADEN)
, which uses the left shift operator. This means take 1 in binary, so 0000 0001
, and shift the 1 left ADEN
times. If we look at either Figure 2 or Figure 1, we see that ADEN
is 7, so we shift the first bit left 7 places, which gives 1000 0000
in binary. This is a “bit mask”, it’s used in the next step.
The operator =
is a variation on the OR operator. It means take whatever is on the RHS, and OR it against the lefthandside (LHS), and put the result in the LHS.^{4} What is the current value of ADCSRA
in the LHS? We don’t know in this simple example; presumably you would know in a real life example. Whatever it is, when we OR it with the RHS, bit 7, ADEN
, gets set to 1, because of how OR is defined. So bit 7 is set to 1, and all other positions are unchanged.
xxxx xxxx // whatever is in ADCSRA
1000 0000 // bitmask from RHS
1xxx xxxx // result of OR (used to overwrite existing ADCSRA)
A more involved example using direct assignment as well as bitwise operators is:
ADCSRA = (1 << ADPS2)  (1 << ADPS1)  (1 << ADPS0);
which can be unpacked as three bitmasks, OR’d against each other to get a final result to be put directly into ADCSRA
. Using the values of ADPS*
, we have:
0000 0001 // 1 << ADPS0 (note ADPS0 = 0 so this is no shift at all)
0000 0010 // 1 << ADPS1
0000 0100 // 1 << ADPS2
0000 0111 // result put directly into ADCSRA overwriting what is there originally
Note that the result overwrites the current value of ADCSRA
; the four most significant bits are set to zero, regardless of whatever value was there. The next example shows you how to avoid that.
Almost the same action can be accomplished with the following code, except it preserves the current settings in ADCSRA
and uses a helper function, bit()
, which is specific to Arduino:
ADCSRA = bit(ADPS0)  bit(ADPS1)  bit(ADPS2);
bit()
is an Arduino function that takes an integer argument and returns an 8 bit value with 1 in the position given by the argument, and zeros elsewhere.^{5} Thus it unpacks to:
0000 0001 // bit(ADPS0)
0000 0010 // bit(ADPS1)
0000 0100 // bit(ADPS2)
// the above 3 lines create the same bitmasks as in Example 2; together they become:
0000 0111 // result of OR the above 3 bitmasks
xxxx xxxx // whatever is in ADCSRA
xxxx x111 // result of OR ADCSRA against 0000 0111
In the previous two examples 1 << ADPS0
or bit(ADPS0)
does very little since ADPS0
is 0. However, many coders seem to prefer a little verbosity to make clear what they are trying to achieve.^{6}
Let’s say you wanted to turn the ADC on if it was off, and off if it was on. This is a job for the ^
or toggle operator. You can use ADCSRA ^= (1 << ADEN)
which unpacks as follows (ADEN
is 7):
1xxx xxxx // initial (on state) of the ADC; other bits unknown
1000 0000 // result of (1 << ADEN)
0xxx xxxx // result of toggling lines 1 and 2; put into ADCSRA; ADC is off
// or, starting with ADC off
0xxx xxxx // ADC is off
1000 0000 // result of (1 << ADEN)
1xxx xxxx // result put into ADCSRA; ADC is now on
Note that the x
bits are toggled against 0
, which means they are unchanged. See the truth table here.
The function _BV(bit)
is aliased to (1 << (bit))
and for Arduino you can use bitSet(x, n)
or sbi(x, n)
to write a 1 to the n
th position of register x
. Thus,
ADCSRA = (1 << ADEN); // seen earlier
ADCSRA = _BV(ADEN);
bitSet(ADCSRA, ADEN);
sbi(ADCSRA, ADEN);
are equivalent ways to change bit 7 in ADCSRA
.
For Arduino, you also have bitClear(x, n)
which writes a 0 at the n
th position of register x
, essentially the complement of bitSet(x, n)
. Internally, it is defined as ((x) &= ~(1 << (n)))
. Alternatively, one can use cbi(x, n)
, the complement of sbi(x, n)
. Let’s say you had 0000 0110
in ADCSRA
and wanted to clear the 2nd bit.
bitClear(ADCSRA, 1); // expands to the following steps:
0000 0110 // initial value in ADCSRA
0000 0010 // value of bit mask (1 << 1)
1111 1101 // value of ~(1 << 1) where all bits have been toggled/flipped
0000 0100 // value after & comparing line 2 to line 3, writing 1 if each mask position is 1
Notice that the 2nd bit has been cleared. The =
part of &=
assigns the result to the LHS, namely ADCSRA
.
Note that sbi()
and cbi()
only work for certain registers on Arduino.
This StackOverflow Question has examples of more functions and an interesting discussion of pros, cons and caveats.
I modified the function found here to print register contents (well, bytes generally) in an easytoread format.
void print_bin(byte aByte) {
for (int8_t aBit = 7; aBit >= 0; aBit) {
if (aBit == 3) {
Serial.print(" "); // space between nibbles
}
Serial.print(bitRead(aByte, aBit) ? '1' : '0');
}
Serial.println(" ");
}
Let’s use it to check a set of operations which blend Example 2 and Example 3 above, and stick to pure C
operations. This code chunk
ADCSRA = B10001000; // arbitrary initial value
print_bin(ADCSRA);
ADCSRA = (1 << ADPS2)  (1 << ADPS1)  (1 << ADPS0);
print_bin(ADCSRA);
displays the following:
1000 1000
1000 1111
Use it to check your work!
Let me state for the record that this is just a first version; additional complexity will almost certainly be needed later.↩︎
The details on each of these can be found on the datasheet which can be found via a search engine.↩︎
The header file is available many places on the internet.↩︎
All the operators can be used the same way: =
, ^=
, &=
, <<=
and >>=
. For example C &= 2
should be thought of as C = C & 2
. See this SO answer.↩︎
It’s essential to be careful with language to be clear. A byte is 8 bits, numbered from the right position as 0, i.e. 76543210. So the first bit is at position 0, etc. Thus bit(0)
returns 0000 0001
.↩︎
These three bits are used as a group to set the clock speed of the ADC, so it makes sense to make it clear you are using all three values together.↩︎
@online{hanson2024,
author = {Hanson, Bryan},
title = {Bitwise {Operators} in {C}},
date = {20240130},
url = {http://chemospec.org/posts/20240130BitwiseOperators/BitwiseOperators.html},
langid = {en}
}
With the polarization coil completed, I decided to take a stab at the software to control the instrument. I felt like I needed to get a feel for how to work with the Arduino so I would understand what kinds of signals I could send to the electronics. In turn that would (ideally) make it easier to understand how the circuits work.
I started by studying Michal’s software (available here). Michal’s software is designed for use with students in a lab course and includes a Python GUI, the actual Arduino control software, and several utilities. One of the utilities is a separate pulse programming module that produces a file accessed by the GUI. At least that appears to be the big picture. Inspection of the Arduino software made it clear that I had, and have, a lot to learn. Arduino code is written in C++
, which encompases the earlier language C
, which some have described as “dressed up assembly language”. Oh boy…
After studying some basic Arduino tutorials, I decided the best way to learn was to write my own software, starting with a simple case of an NMRlike interface that would turn Arduino pins on and off to control the various pieces of hardware I will eventually build. Turning pins on and off is really simple on the Arduino, that’s not the challenge. For this instrument, the challenge is that there are several events that occur one after the other on very short time scales. Roughly, one must turn the polarization coil on, then off, then turn on the transmitter and turn it off, and then turn on the receiver and listen. Due to the realities of electronics, there need to be short delays between some of these events so that the electronic signals can “warm up”, or “cool down”. To make this initial version manageable, I decided to not worry about the time scale in detail for now, and focus on building an extensible framework that takes NMRlike inputs to turn things on and off.
Since R is my computational lingua franca, I decided to think about how I would set up a series of events in R
and calculate their on/off times given the duration (or length) of each event. This was quite straightforward; if you know the duration of each event then the on/off times can be computed with a cumulative sum process.
#'
#' Convert a Named Vector Giving Event Durations to a Data Frame
#'
#' @param event_lengths Numeric. A named numeric vector giving the durations (lengths)
#' of a series of events which occur in the given order.
#' @return A data frame containing the on and off times for each event.
#'
event_length_to_event_on_off < function(event_lengths) {
off < cumsum(event_lengths)
on < c(0, off[1:(length(off)  1)])
DF < data.frame(event = names(event_lengths), on = on, off = off)
DF
}
And then I needed a function to visualize the result, which is basically a sort of Gantt chart where the events never overlap.
#'
#' Create a Gantt Chart of NMR Event Timing
#'
#' @param my_events Data frame.
#' @return `ggplot2` object.
#'
events < function(my_events = NULL) {
p < ggplot(my_events, aes(x = on, xend = off, y = event, yend = event))
p < p + geom_segment(linewidth = 8) + theme_bw()
p < p + labs(title = "NMR Event Timing", x = "time, microseconds", y = "")
p < p + scale_y_discrete(limits = my_events$event)
p
}
Figure 1 shows these functions in action. So far, so good.
f < 1e6 # conversion factor, seconds to microseconds
ev < c(10 * f, 5, 1 * f, 5, 5 * f, 10 * f )
names(ev) < c("pol_coil", "del_pt", "transmitter", "del_tr", "receiver", "relax_delay")
p1 < events(event_length_to_event_on_off(ev))
p1
Next, I decided to write something more or less equivalent in C
. This meant learning C
. Suffice it to say, C
provides none of the niceties of R
. There are few atomic types in C
, and in particular strings and arrays are not native entities. Instead, one must think in terms of pointers to particular memory addresses that hold the strings or arrays. So the entire paradigm is different, and requires thinking about solving problems in new ways. Overall, this has been a good experience. After a lot of struggle, I managed to write functions that carry out the equivalent of the R
functions above, except instead of graphical output there is tabular output (there really is no graphical output in the usual sense for Arduino so we need to have other ways of verifying our results). I won’t give details of this work here, as the next section reviews how it was implemented for Arduino.
The version of event timing in C
was adapted to the Arduino with relatively minor modifications, mostly related to how results are printed to the console (the C
and C++
languages for Arduino are specialized versions of the languages). I also wrote a system to control the starting and stopping of the scans, thinking ahead of how the program is actually going to be used. All user inputs are in a single file, including a simple version of a pulse program (tons of work will be needed in the future on this piece). My overall goal is to write an entire NMR control and acquistion program that runs completely on the Arduino IDE. Well, almost completely: some other entity will have to slurp up the data coming from the Arduino, as there is very little memory on the Arduino. Not sure if this can be done but that’s the goal. The code for this project is stored in a public repo here.
The output of a “run” on this “instrument” is shown in Figure 2. The table lists the event name, the Arduino pin that should be activated, and the on/off times for the events. Times are in milliseconds in the example, and are relevant for testing, not an actual NMR scan. A pin value of 1 indicates no pin is active; such an event is just a delay period so the (not yet built) electronics can settle.
This program was further tested by wiring the Arduino to a breadboard with a few LEDs and resistors to limit the current to the LEDs appropriately. The video below shows the program in action, doing two scans with the durations as shown in Figure 2. The pins from left represent polarization coil power, transmit, and receive signal (the latter of course should be listening, not powering something). As a proof of concept I’m pretty happy with this result.
So much to do, but I’m not in a hurry and can choose to do things in any order that inspires me:
@online{hanson2024,
author = {Hanson, Bryan},
title = {Building an {EFNMR} {Part} 2},
date = {20240101},
url = {http://chemospec.org/posts/20240101EFNMRBuild2/EFNMRBuild2.html},
langid = {en}
}
I was inspired by the really simple EFNMR instrument developed by Andy Nichol (“Nuclear Magnetic Resonance for Everybody”). Nichol’s work made it clear that one could observe an NMR signal without complex equipment. As I did more reading however, I settled on following the build of Carl Michal (Michal (2010)) as it will allow for more complex experiments, and provides more opportunity to learn electronic circuits.
Michal’s design uses two coils: a polarization coil, and a transmit/receive (T/R) coil. This post will cover the construction of the polarization coil. Michal’s polarization coil is a threelayer solenoid constructed with 18 AWG magnet wire. Each layer is a separate wire but in operation, the three layers are wired in parallel. I scaled the coil dimensions down somewhat so that I could use materials that are readily accessible to me.^{1} The plan is to use a 50 mL centrifuge tube as the sample holder. The sample will be placed in a T/R coil wound around a 1.25” schedule 40 PVC pipe. The T/R coil will be located inside the polarization coil, which will be wound on a 2” schedule 40 PVC pipe. The dimensions of these pipes were chosen to allow the sample to nest easily inside the T/R coil which nests inside the polarization coil. Figure 1 shows a crosssection of the design.^{2}
The form for the polarization coil was made from a 12 cm length of 2” PVC pipe. Two retaining rings were very carefully cut from a 2” PVC coupling. The retaining rings were 1 cm wide. The parts are shown in Figure 2. The rings were then glued to the ends of the form using a minimal amount of standard PVC glue. The inner edges of the rings correspond to the original end of the coupling which provides a clean and straight edge where it will rest against the magnet wire. The ends of the assembly were lightly sanded. As built, the length available for the windings is 102 mm.
Next, three holes were drilled close to each of the retaining rings, about 1 cm apart. The magnet wire will pass through these holes, which will serve to keep the wire in place as it is wound. Figure 3 shows these holes. A short length of wire was placed in the holes as a “keeper” as the winding was carried out. This ensured that the winding for the first layer did not block the holes for the second and third layers of wire (Figure 4).
A winding jig was constructed from 1/4” hobby plywood. The base is 6 x 12”. Small nails and glue were used to assemble the sides and back. A 1/4” threaded rod serves as the rotational axis. Nuts and washers secure a simple handle as well as position the rod overall in the jig. Figure 5 and Figure 6 show the jig.
A holder for the wire spool was constructed with 1/16” x 1” aluminum bar. The bar was bent into a shape that would provide a way to apply friction to the sides of the spool, thus controlling the tension on the wire as it pays out. The spool is mounted on a 1/4” threaded rod and there are wingnuts on each side, which when tightened press the aluminum bar against the spool. The threaded rod does tend to unscrew as the wire is spooled out, but the process is slow enough that one can correct this as needed. If I were going to do this alot I would replace the wingnut on the side that tends to unwind with two nuts locked against each other. The holder is loosely attached to the work bench so that it can pivot as needed to accommodate the changing angle of the wire as it moves across the form. Figure 7 shows the design.
The form was more or less centered on the threaded rod using a couple of wooden guide pieces. The winding process is shown in Figure 8. The wire for the first layer comes from inside the form and up through one of the holes and is wound on the form. The action of the keepers is apparent. The fingers are used to position the wire correctly. In principle tension on the wire is provided by tightening the wing nuts on the wire supply holder. However, I did not tighten them enough and I had to wrestle with getting layer one tight enough. This caused problems with the subsequent layers as you will see!
The completed layer one is shown in Figure 9. The winding looks even. Layer two is shown in Figure 10. Because layer one was a little loose, the wire for layer two would sometimes slip inbetween the wires of layer one and force them apart. This was exacerbated because I was using more tension on the wire supply for layer two. Clearly the layer is not even. In addition, winding layer two was more difficult because without the white background one cannot see the progress very well.
The problems only worsened with layer 3 (Figure 11). I am not happy with the final result, but the wire is positionally stable and it should carry out its function well enough. What I’ve learned here will help when winding the T/R coil.
The polymeric insulation on the leads was sanded off (Figure 12) and the resistance of each coil was measured. Each gave a resistance of about 0.7 and there were no shorts between the layers.
The next step will be the construction of the polarization coil power supply, and integration of the Arduino controller. I’m not in a hurry!
Of course, there will be less sample and therefore a smaller signal.↩︎
The dimensions of schedule 40 PVC products are readily available online, which made planning the overall design much simpler.↩︎
@online{hanson2023,
author = {Hanson, Bryan},
title = {Building an {EFNMR} {Part} 1},
date = {20231024},
url = {http://chemospec.org/posts/20231024EFNMRBuild1/EFNMRBuild1.html},
langid = {en}
}
As all organic chemists know, in NMR we use the rule to determine splitting, and Pascal’s triangle as a nemonic to remember the relative areas of the peaks within a multiplet. For instance, we expect that the group in ethanol to be a triplet with areas 1:2:1, due to the group having two proton neighbors in the group. We treat the two protons in as magnetically equivalent.
The rule works at typical fields used for structural determination, let’s say 60 MHz and above.^{1} At these fields one is working in the socalled “weak coupling” region. However, as one lowers the field to really low values, one encounters the “strong coupling” region, where one observes “Jcoupledspectra” or JCS. Under strong coupling, the protons in ethanol are no longer magnetically equivalent, and each of them couples differently to other nuclei, and the rule breaks down.
The strict requirement for JCS is that there be two or more protons attached to a spin heteroatom and the magnetic field be quite small. For a simple system, let’s say , the strict requirement to see separate lines for the nolongerequivalent protons is:
If this seems a bit strange, well, 1) it is, and 2) it has always been the case that the “equivalent” protons in for example a group do couple, we just don’t normally see it or worry about it.^{2}
How small does the magnetic field have to be for Jcoupled spectra to appear? This is covered in detail in Appelt et al. (2010) but generally speaking JCS appear at around to Tesla.^{3} The magnetic field of earth is around 50 mT, right in the sweet spot. The Larmor resonance frequency for at this field strength is around 2 kHz.
In the case of a system like , the number of lines that will be observed is
Where is the set of odd numbers (for odd , one evaluates until ; for even , evaluate until ). The leading multiplier of 2 accounts for the doublet due to . This formula doesn’t exactly roll off the tongue. We can evaluate it to get the first few terms:
N < 5L # evaluate 1:N terms
no.lines < rep(NA_integer_, N) # initialize storage
for (i in 1:N) {
odd < (1:i) %% 2
n < (1:i)[as.logical(odd)] # get odd n no larger than N
no.lines[i] < sum(i  n + 1) # take advantage of R's vectorization
}
names(no.lines) < paste("N=", 1:N, sep = "") # pretty it up
no.lines < no.lines * 2 # account for J_HX
no.lines
N=1 N=2 N=3 N=4 N=5
2 4 8 12 18
A couple of examples should clarify the situation. All of these will be from the perspective of observing .^{4} These examples are taken from Appelt et al. (2007).
At high field the spectrum of would be a symmetric doublet with a peak separation of .
In earth’s field, the spectrum is first split into a doublet by , but the spacing is not symmetric. Then, each part of the doublet is split further into two peaks, also asymmetrically and with varying linewidths. Figure 1 shows how the splitting changes as a function of field strength. Note that in the strong coupling region there are four peaks, as predicted above.
For the case of methanol in Earth’s field, the spectrum is first asymmetrically split by the with spacing . Then each part of the doublet is further split into four peaks. Figure 2 shows the EF spectrum of methanol. The asymmetry of the line spacing and line widths is apparent.
For a broad overview of this topic, take a look at Kaseman et al. (2020); for a detailed walkthrough of the theory with many more examples, see Appelt et al. (2007), and other papers by Appelt et al. Be prepared to spend some time with these papers.
60 MHz chosen simply because commercial instruments have been available at that field for forever.↩︎
The protons in something like actually do couple to each other. With a little trick, you can measure .↩︎
This is a general trend. The exact boundaries between various coupling regimes depends on the nuclei involved, the coupling constants and the peak separation in Larmor frequency (in Hz).↩︎
Remember, signals are very weak at EF so observing heteronuclei is significantly more challenging. See the previous post for details.↩︎
@online{hanson2023,
author = {Hanson, Bryan},
title = {The n + 1 Rule in {Earth’s} {Field} {NMR}},
date = {20230918},
url = {http://chemospec.org/posts/20230918EFNMR2/EFNMR2.html},
langid = {en}
}
This is a good example of Free and Open Source Software (FOSS). ChemoSpec is licensed under GPL3 which permits any reasonable use as long as there is attribution to the original authors.
Check out the first line of the “About Delta” box:
@online{hanson2023,
author = {Hanson, Bryan},
title = {JEOL’s {Delta} {Now} {Includes} {ChemoSpec}},
date = {20230823},
url = {http://chemospec.org/posts/20230823CSDelta/CSDelta.html},
langid = {en}
}
It’s been nearly a year, and there are a number of new entries. Let’s do a quick comparison of the results from November 2022 versus August 2023. Back in November 2022 there were 246 packages; nearly a year later there are 287. Figure 1 shows a Venn diagram of the changes.
Software development in spectroscopy is clearly actively occurring in the Python ecosystem; R has stalled (see Table 1). Interpretation of this observation is challenging. A few thoughts:
language  Nov 2022  Aug 2023 

Python  162  198 
R  60  61 
C++  4  5 
Java  4  4 
Julia  4  5 
C  2  2 
Qt  2  2 
Cshell  1  1 
C#  1  2 
Fortran  1  1 
Go  1  1 
html  1  1 
JavaScript  1  2 
TypeScript  1  1 
XML  1  1 
Table 2 shows the change in package focus. Most categories grew modestly.
category  Nov 2022  Aug 2023 

Any  32  34 
Data Sharing  33  41 
EEM  3  3 
EPR, ESR  5  7 
IR (all flavors)  35  38 
Raman  28  34 
UVVis, UV, Vis  19  20 
LIBS  3  5 
Muon  1  0 
PES  1  2 
XRF, XAS  10  15 
NMR  87  97 
Time Series  3  3 
I’ve curated this site for several years now. One thing that is clear is that there is a lot of duplication of effort and features. I mentioned above a few reasons for this, but at some point it makes more sense to add to an existing package than to write one from scratch. However, this can only happen if people look around for existing software first. That of course is one purpose of the FOSS for Spectroscopy web site.
As I look at it,
This design decision is the core of building a package. Once you have decided on a structure:
In an ideal world, a data storage structure is chosen and everything else can be built later, quickly at first and then more slowly. The reality however is that people keep reinventing most of the wheel. I suppose this is not too different from people inventing entirely new computer languages…
I say “usually spectra” because for some instruments, depending upon the goal of the package, one may store raw data that must be transformed in a separate step. The best example is raw timedomain NMR data which must be Fourier transformed into frequencydomain spectra before analysis.↩︎
@online{hanson2023,
author = {Hanson, Bryan},
title = {FOSS4Spectroscopy {Update}},
date = {20230815},
url = {http://chemospec.org/posts/20230815F4SUpdate/F4SUpdate.html},
langid = {en}
}
Let’s take a closer look from first principles what kinds of information one can glean from EFNMR. We’ll restrict our discussion to spin nuclei with ~100% abundance, like , or – you’ll see why soon enough. Table 1 gives some relevant physical parameters for these nuclei.
Nuclei  Gyromagnetic ratio  Larmor Freq. 

26.7522  100  
25.1815  94  
10.8394  40.5 
Excellent general references on NMR theory are Friebolin (Friebolin 2011) and Claridge (Claridge 2016).
The line width of an NMR signal is primarily dependent on the homogeneity of the field, which in the case of earth’s field is very good. Appelt et al. (2006) state that when observations are made >100 meters from buildings and ferrous structures^{1} the homogeneity of the earth’s magnetic field for small sample volumes is in the range of . They further state that when seconds line widths will be less than 0.1 Hz.^{2} This all sounds very promising: narrow lines imply good separation between peaks.
One of the characteristics of highfield NMR which makes it so useful is the dispersion of chemical shifts as a function of structure. Unfortunately, EFNMR has effectively zero chemical shift dispersion. The equation for computing chemical shift, , is:
where the units are:
since is a field strength independent quantity. Taking to be zero, e.g. TMS added to the sample, we can rearrange the equation to get . Consider the compound whose methyl group has a chemical shift of 2.63 ppm. Using an earth’s field Larmor frequency of 19.1 KHz, we can compute the shift of in Hz as 0.0191 Hz. This is an extremely small value, smaller than the typical line width in earth’s field (so the promise of narrow line widths is not going to save us).
For further comparison, we can do the same calculation for which has a shift of 4.90 ppm. The result is exactly the same, 0.0191 Hz. We can see that these two compounds with differing numbers of halogens, which would be trivial to distinguish with a low field benchtop instrument operating at 80 MHz, are indistinguishable in earth’s field. This is due to the very small value of earth’s magnetic field.
While the chemical shift dispersion in earth’s field is clearly nil, heteronuclear J couplings are readily observed due to their greater magnitude, up to about 200 Hz. Appelt et al. (2006) gives a number of interesting examples involving , and containing compounds.
Basic NMR theory tells us that the energy difference between the two quantum states for a spin nucleus is proportional to the field strength :
where is . A plot for is shown in Figure 1; the rightmost point corresponds to a 1,000 MHz instrument. Clearly as goes to zero the goes to zero in a simple linear fashion.
We can then relate the number of nuclei in the upper energy state, , to that in the lower energy state, , at thermal equilibrium as:
where is the Boltzman constant and is the temperature in Kelvin. The ratio of population states is nearly equal for any value of but of course gets even worse as decreases. This is the reason for the low overall sensitivity of NMR as an analytical technique. We can compute the ratio for at room temperature; we’ll compare the value for earth’s field to those of a 100 and 1,000 MHz instruments:
45 uT (Earth) 2.35 T (100 MHz) 23.49 T (1,000 MHz)
1.000000000000 0.999999999998 0.999999999982
As you can see, in earth’s field there is basically no difference in the two population states, meaning there is no signal to observe. Clearly a problem!
If all the nuclei were in we could measure the energy required to bump them up to , or more commonly, bump them up and then watch the energy given off as equilibrium returns. Unfortunately, the signal produced is proportional to , which is effectively zero in earth’s field. At the same time however, the more spins we have, the higher the signal will be. More spins total in the detection coil sweet spot will be helpful, but there are other factors mitigating against making large coils to accommodate large samples. One way around this is to use signal averaging.
In the case of earth’s field NMR, the usual way around this problem of very limited signal is to prepolarize the sample.^{3} This basically involves subjecting the sample to a fairly high magnetic field for a brief period before measuring the any signals. This prepolarization field forces more of the nuclei to assume the lower energy state, thus increasing which means there is a signal to be observed. Mohorič has an excellent but technical discussion of the details of this process (Mohorič and Stepišnick 2009).
What is the Larmor (resonance) frequency in earth’s field? Earth’s magnetic field varies from about 25 to 65 T; we’ll use an intermediate value of 45 T for our calculations. The Larmor frequency is given by the equation:
Notice there is a simple linear relation between and .^{4} If we plug in values for our nuclei we get the following values in Hz:
1H 19F 31P
19159.852 18034.921 7763.148
What we have shown here is that for EFNMR, resonance frequencies are in the audio (20  20,000 Hz) and lower radio (20,000 Hz +) frequency range. Why is this important? It greatly simplifies signal detection because audio receivers are essentially radios, and the electronics for working in this frequency range are extremely well worked out, and not expensive to buy or build.
The first earth’s field NMR experiment was apparently conducted by Martin Packard and Russell Varian while at Varian Associates (Packard and Varian 1954). Varian Associates was of course a major instrument player, including NMR, and for a long time marketed their instruments largely toward colleges. ^{5}
Keep in mind that buried utilities made of iron or carrying electrical current can interfere.↩︎
is the relaxation time for magnetization aligned with the axis, which corresponds to the axis. This is the relaxation time that affects the ability to pulse quickly. It’s also called the spinlattice relaxation time. is the relaxation time corresponding to magnetization in the plane, and is also known as the spinspin relaxation time. is largely determined by magnetic field inhomogeneity and the line width at half peak height is . . See Friebolin chapter 7 for a detailed discussion.↩︎
In fact prepolarizing or polarizing the sample is now envogue for higher field instruments as well, in the form of DNP, SABRE etc.↩︎
The gyromagnetic ratio can be negative, hence the absolute value is taken here.↩︎
Martin Packard is apparently unrelated to David Packard, one of the founders of HP.↩︎
@online{hanson2023,
author = {Hanson, Bryan},
title = {Earth’s {Field} {NMR}},
date = {20230726},
url = {http://chemospec.org/posts/20230719EFNMR1/EFNMR1.html},
langid = {en}
}
A number of simple designs for photometers and spectrometers have been published. What drew me to McClain’s approach is that his goal is to teach some basic electronics relevant to instrument design, which is something I have wanted to learn for sometime (apparently since 2014, though actually I think this goes back to watching my father build a Heath Kit stereo receiver which used tubes). Further, McClain starts with a very simple design, and then adds circuit modules to improve the design. Everything is laid out logically and is easy to follow. At each step there is an opportunity to go further to understand how the circuit actually works in detail.
In this post I’ll describe the project at various stages. All the electronics are McClain’s design, but instead of McClain’s cuvette holder I used the design of Kvittingen (Kvittingen et al. (2017)) which uses LEGO bricks as a sample holder and can accommodate an additional detector for fluorescence measurements.
This design is a photometer, and not a spectrophotometer, because only one wavelength at a time can be measured. The source LED must have an emission spectrum overlapping with the of the compound to be measured; LEDs are available which cover pieces of the whole visible spectrum so it’s pretty easy to swap for a different wavelength range. The detector photodiode (a type of LED, working in reverse) responds over a broad wavelength range, though with greatly varying efficiency. If one wants to measure fluorescence, the photodiode is moved to the 90 position.^{1}
A couple of important notes:
In this version a standard “green” LED (maximum emission at 523 nm) is used as the light source and has the simplest possible power supply. As built, the system provides a current of about 26 mA to the LED. The data sheet recommends 30 mA max.
The detector in this version is a photodiode linked to a TIA, a transimpedance amplifier. This is an current to voltage (I to V) converter, and something similar can be used in any instrument where a detector generates a current. Figure 1 shows the circuit.
The main deviation from McClain’s design is that R2 needed to be set to 3M in order to reach about 1V on the output. McClain gives a range of 100K to 1M. As the value of this resistor goes up, the output voltage goes up due to increasing amplification. This change is likely necessary as the photodiode in use here is a bit different than McClain specified. After some experimentation, the current on I1 (which replicates the current produced by the photodiode in the simulation) was set to 1/10,000 of the value of the current of D1, based upon currents observed when isolating D2 from the rest of the circuit.
Monitoring the current and voltage across D2 as built and warmed up, the values were about 0.3 A and 0.23 V; if the LEGO holding D1 was moved immediately adjacent to that holding D2 these numbers were 0.7 A and 0.26 V. These readings support the discussion above that the photodiode was generating a relatively small response.
Figure 2 and Figure 3 show the project from each side.
The next step in McClain’s scheme is to change the basic power supply to a more sophisticated “relaxation oscillator” which produces a square wave output with a certain frequency. The idea here is to eliminate stray room light from affecting the output by using a specific AClike frequency as the source and then modify the detector to only see this frequency. Stray room light may consist of random light causing DC offsets in the circuit, or something more determinant like 60 Hz flicker from light fixtures.
The relaxation oscillator circuit was modeled in CircuitLab before building the circuit. The circuit is in Figure 4 and the simulation results are shown in Figure 5.
Capacitor C2 controls the frequency of the square wave produced by the relaxation oscillator. Figure 6 shows the oscilloscope traces with C2 set to 1F which gives a frequency of about 8 Hz, as seen in the video below. This serves as visual “proof of concept”. Figure 7 shows the oscilloscope traces for a value of 4700 pF for C2 which generates a square wave with frequency 1,500 Hz. This is higher than the frequency of any room light flickering and thus will serve as a “carrier” of the absorbance value unaltered by any stray room light, once we add the other modules to the detection side.
Note that all oscilloscope traces have two vertical scales, one on the left and one on the right, color coordinated with the trace.
The built version of the relaxation oscillator corresponds well with the simulation.
This final version contains all the circuits as described by McClain. I decided to measure voltages directly at the output rather than use an Arduino and display to provide an absorbance value.
Figure 8 shows the final circuit. Note that several test points are labeled and referred to in the discussion below.
The details of the relaxation oscillator are exactly as described above.
As the simulation of the relaxation oscillator shows, the current output of the op amp is very small. Consequently a simple transistor is used to bump up the current driving the LED source to an appropriate value.
The I to V converter circuit is the same as described earlier.
A high pass filter takes a signal that is timevarying, in our case a square wave, and filters it so that only high frequency components are kept. This is a key part of the detector design, since we create an approximately 1,500 Hz square wave and any other component, like 60 Hz flicker from room lights, should be eliminated. Figure 9 shows an isolated version of our high pass filter, and Figure 10 shows the frequency dependency filtering.
A half wave rectifier converts an alternating current, alternating between positive and negative values, into a positive only form. Essentially, the negative portion of the signal is converted to positive values, and the positive portion is set to zero. Figure 11 shows the action of the rectifier.
The final step is an active low pass filter which only passes signals below a certain frequency and amplifies them (that’s the active part). Importantly, in addition to amplifying the signal, the op amp emits a steady DC voltage which is ultimately proportional to the current hitting the photodiode. This is the value we are after when making absorbance measurements. Figure 12 shows the actual output.
If we isolate the low pass filter circuit we can try to understand its operation in greater detail. Figure 13 shows the isolated circuit with simulation inputs configured to match the measured inputs.
If we look at the frequency dependence of this circuit, we see that low frequencies are passed relatively unattenuated (Figure 14), as expected. The combination of the earlier high pass filter and this low pass filter amounts to a band pass filter. This suggests a potential follow up design which uses a band pass filter followed by rectification and conversion to DC by some combination of op amps.
In addition to the filtering behavior, we know that the circuit produces a steady DC current from the approximately square wave input. Let’s check this using the simulator again, but this time looking at output voltages. Figure 15 shows the results, which should ideally be close to those in Figure 12.
A calibration curve was prepared using a 10 mL plastic syringe and some small bottles. Two drops of red food coloring were added to 10 mL of water to create the first solution. Three mL of the stock solution was added to seven mL of water. This 2nd solution was then diluted in similar fashion and so forth, to get five total solutions. Tap water was used. The green LED was disconnected and the dark current was measured. Next, tap water was used as a blank. Then the voltage for each sample was recorded (voltage measurements are taken at point F in Figure 8). Listing 1 shows the computational steps. Figure 16 shows the samples from most concentrated to least concentrated.
Table 1 shows the results. A calibration curve is shown in Figure 17. Clearly the most concentrated samples exceed the linear behavior expected for Beer’s Law (as observed by McClain). If the two most concentrated samples are dropped, the result is a nice linear relationship, as seen in Figure 18 and the summary of the fit in Listing 2.
Concentration  Voltage  Absorbance 

1.0000  0.0262  2.611089 
0.3000  0.0268  2.581818 
0.0900  0.0340  2.284567 
0.0270  0.0999  1.074541 
0.0081  0.1960  0.369747 
Call:
lm(formula = DF35$Absorbance ~ DF35$Concentration)
Residuals:
1 2 3
0.03688 0.15983 0.12294
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 0.3118 0.1840 1.694 0.3394
DF35$Concentration 22.3292 3.3801 6.606 0.0956 .

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.205 on 1 degrees of freedom
Multiple Rsquared: 0.9776, Adjusted Rsquared: 0.9552
Fstatistic: 43.64 on 1 and 1 DF, pvalue: 0.09564
Not too bad!
I have not tested the fluoresence measurement as other projects are calling me. In addition to changing the position of the photodiode, a few resistors may need to be changed in order to achieve sufficient signal.↩︎
@online{hanson2023,
author = {Hanson, Bryan},
title = {Home {Built} {Photometer}},
date = {20230716},
url = {http://chemospec.org/posts/20230716Photometer/Photometer.html},
langid = {en}
}
The development of simple, homebuilt NMR instruments over the past two decades is very interesting and appealing. These instruments typically don’t have a magnet, but rather use the earth’s magnetic field and some type of polarization process to improve sensitivity. Most of these instruments use an inexpensive microprocessor like an Arduino or Raspberry Pi to control the instrument, along with some purposebuilt electronic circuits. Good examples are the work of Michal (Michal (2010), Michal (2020)), Trevelyan (Manley (2019)) and Bryden (Bryden et al. (2021)). These instruments of course aren’t able to give the same results as higherfield instruments with superconducting magnets or Halbach arrays. What can you do with these instruments? Because earth’s magnetic field is very homogeneous locally, the line widths are very narrow, and thus coupling constants and can be measured.^{1} However, the chemical shift range is really small, so structural studies are out. Sensitivity is relatively poor as well. Imaging (MRI) is in principle possible. By the way, there are also examples of DIY Nuclear Quadropole Resonce (NQR) instruments as well, which require no magnetic field (Hiblot et al. (2008)).
Recently, a simpler DIY NMR instrument was published as a Hackaday project by Andy Nichol. This “Nuclear Magnetic Resonance for Everybody” project is unique due to its use of only offtheshelf commericially available hardware components. Because the hydrogen Larmor precession frequency in earth’s magnetic field is in the audio range, the project uses a standard and readily available audio amplifier to simplify the signal detection process. In addition, the complexities of pulse programming are avoided in this project by using a mechanical switch to switch between polarization and detection modes. Finally, a single coil is employed for both polarization and detection. Signal processing is handled by readily available software.
This is an interesting project and it is the most basic entry point into DIY NMR that I have encountered. If it whets your appetite, the project can be made progressively more sophisticated by selectively bringing in the more advanced features of some of the other designs.
Locally homogeneous provided you are away from buildings, electrical transmission lines etc.↩︎
@online{hanson2023,
author = {Hanson, Bryan},
title = {DIY {NMR} in {Earth’s} {Field}},
date = {20230612},
url = {http://chemospec.org/posts/20230612DIYNMR/DIYNMR.html},
langid = {en}
}
@online{hanson2022,
author = {Hanson, Bryan},
title = {You {Can} {Now} {Subscribe}},
date = {20221107},
url = {http://chemospec.org/posts/20221107AnnounceSubscribe/AnnounceSubscribe.html},
langid = {en}
}
Back in Part 2 I mentioned some of the challenges of learning linear algebra. One of those challenges is making sense of all the special types of matrices one encounters. In this post I hope to shed a little light on that topic.
I am strongly drawn to thinking in terms of categories and relationships. I find visual presentations like phylogenies showing the relationships between species very useful. In the course of my linear algebra journey, I came across an interesting Venn diagram developed by the very creative thinker Kenji Hiranabe. The diagram is discussed at Matrix World, but the latest version is at the Github link. A Venn diagram is a useful format, but I was inspired to recast the information in different format. Figure 1 shows a taxonomy I created using a portion of the information in Hiranabe’s Venn diagram.^{1} The taxonomy is primarily organized around what I am calling the structure of a matrix: what does it look like upon visual inspection? Of course this is most obvious with small matrices. To me at least, structure is one of the most obvious characteristics of a matrix: an upper triangular matrix really stands out for instance. Secondarily, the taxonomy includes a number of queries that one can ask about a matrix: for instance, is the matrix invertible? We’ll need to expand on all of this of course, but first take a look at the figure.^{2}
Let’s use R
to construct and inspect examples of each type of matrix. We’ll use integer matrices to keep the print output nice and neat, but of course real numbers could be used as well.^{3} Most of these are pretty straightforward so we’ll keep comments to a minimum for the simple cases.
A_rect < matrix(1:12, nrow = 3) # if you give nrow,
A_rect # R will compute ncol from the length of the data
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Notice that R
is “column major” meaning data fills the first column, then the second column and so forth.
A_row < matrix(1:4, nrow = 1)
A_row
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
A_col < matrix(1:4, ncol = 1)
A_col
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
Keep in mind that to save space in a textdense document one would often write A_col
as its transpose.^{4}
A_sq < matrix(1:9, nrow = 3)
A_sq
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Creating an upper triangular matrix requires a few more steps. Function upper.tri()
returns a logical matrix which can be used as a mask to select entries. Function lower.tri()
can be used similarly. Both functions have an argument diag = TRUE/FALSE
indicating whether to include the diagonal.^{5}
upper.tri(A_sq, diag = TRUE)
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] FALSE TRUE TRUE
[3,] FALSE FALSE TRUE
A_upper < A_sq[upper.tri(A_sq)] # gives a logical matrix
A_upper # notice that a vector is returned, not quite what might have been expected!
[1] 4 7 8
A_upper < A_sq # instead, create a copy to be modified
A_upper[lower.tri(A_upper)] < 0L # assign the lower entries to zero
A_upper
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 0 5 8
[3,] 0 0 9
Notice to create an upper triangular matrix we use lower.tri()
to assign zeros to the lower part of an existing matrix.
If you give diag()
a single value it defines the dimensions and creates a matrix with ones on the diagonal, in other words, an identity matrix.
A_ident < diag(4)
A_ident
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
If instead you give diag()
a vector of values these go on the diagonal and the length of the vector determines the dimensions.
A_diag < diag(1:4)
A_diag
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 2 0 0
[3,] 0 0 3 0
[4,] 0 0 0 4
Matrices created by diag()
are symmetric matrices, but any matrix where is symmetric. There is no general function to create symmetric matrices since there is no way to know what data should be used. However, one can ask if a matrix is symmetric, using the function isSymmetric()
.
isSymmetric(A_diag)
[1] TRUE
Let’s take the queries in the taxonomy in order, as the hierarchy is everything.
A singular matrix is one in which one or more rows are multiples of another row, or alternatively, one or more columns are multiples of another column. Why do we care? Well, it turns out a singular matrix is a bit of a dead end, you can’t do much with it. An invertible matrix, however, is a very useful entity and has many applications. What is an invertible matrix? In simple terms, being invertible means the matrix has an inverse. This is not the same as the algebraic definition of an inverse, which is related to division:
Instead, for matrices, invertibility of is defined as the existence of another matrix such that
Just as cancels out in , cancels out to give the identity matrix. In other words, is really .
A singular matrix has determinant of zero. On the other hand, an invertible matrix has a nonzero determinant. So to determine which type of matrix we have before us, we can simply compute the determinant.
Let’s look at a few simple examples.
A_singular < matrix(c(1, 2, 3, 6), nrow = 2, ncol = 2)
A_singular # notice that col 2 is col 1 * 3, they are not independent
[,1] [,2]
[1,] 1 3
[2,] 2 6
det(A_singular)
[1] 0
A_invertible < matrix(c(2, 2, 7, 8), nrow = 2, ncol = 2)
A_invertible
[,1] [,2]
[1,] 2 7
[2,] 2 8
det(A_invertible)
[1] 2
A matrix that is diagonalizable can be expressed as:
where is a diagonal matrix – the diagonalized version of the original matrix . How do we find out if this is possible, and if possible, what are the values of and ? The answer is to decompose using the eigendecomposition:
Now there is a lot to know about the eigendecomposition, but for now let’s just focus on a few key points:
We can answer the original question by using the eigen()
function in R
. Let’s do an example.
A_eigen < matrix(c(1, 0, 2, 2, 3, 4, 0, 0, 2), ncol = 3)
A_eigen
[,1] [,2] [,3]
[1,] 1 2 0
[2,] 0 3 0
[3,] 2 4 2
eA < eigen(A_eigen)
eA
eigen() decomposition
$values
[1] 3 2 1
$vectors
[,1] [,2] [,3]
[1,] 0.4082483 0 0.4472136
[2,] 0.4082483 0 0.0000000
[3,] 0.8164966 1 0.8944272
Since eigen(A_eigen)
was successful, we can conclude that A_eigen
was diagonalizable. You can see the eigenvalues and eigenvectors in the returned value. We can reconstruct A_eigen
using Equation 4:
eA$vectors %*% diag(eA$values) %*% solve(eA$vectors)
[,1] [,2] [,3]
[1,] 1 2 0
[2,] 0 3 0
[3,] 2 4 2
Remember, diag()
creates a matrix with the values along the diagonal, and solve()
computes the inverse when it gets only one argument.
The only loose end is which matrices are not diagonalizable? These are covered in this Wikipedia article. Briefly, most nondiagonalizable matrices are fairly exotic and real data sets will likely not be a problem.
In texts, eigenvalues and eigenvectors are universally introduced as a scaling relationship
where is a column eigenvector and is a scalar eigenvalue. One says “ scales by a factor of .” A single vector is used as one can readily illustrate how that vector grows or shrinks in length when multiplied by . Let’s call this the “bottom up” explanation.
Let’s check that is true using our values from above by extracting the first eigenvector and eigenvalue from eA
. Notice that we are using regular multiplication on the righthandside, i.e. *
, rather than %*%
, because eA$values[1]
is a scalar. Also on the righthandside, we have to add drop = FALSE
to the subsetting process or the result is no longer a matrix.^{7}
isTRUE(all.equal(
A_eigen %*% eA$vectors[,1],
eA$values[1] * eA$vectors[,1, drop = FALSE]))
[1] TRUE
If instead we start from Equation 4 and rearrange it to show the relationship between and we get:
Let’s call this the “top down” explanation. We can verify this as well, making sure to convert eA$values
to a diagonal matrix as the values are stored as a vector to save space.
isTRUE(all.equal(A_eigen %*% eA$vectors, eA$vectors %*% diag(eA$values)))
[1] TRUE
Notice that in Equation 6 is on the right of , but in Equation 5 the corresponding value, , is to the left of . This is a bit confusing until one realizes that Equation 5 could have been written
since is a scalar. It’s too bad that the usual, bottom up, presentation seems to conflict with the top down approach. Perhaps the choice in Equation 5 is a historical artifact.
A normal matrix is one where . As far as I know, there is no function in R
to check this condition, but we’ll write our own in a moment. One reason being “normal” is interesting is if is a normal matrix, then the results of the eigendecomposition change slightly:
where is an orthogonal matrix, which we’ll talk about next.
An orthogonal matrix takes the definition of a normal matrix one step further: . If a matrix is orthogonal, then its transpose is equal to its inverse: , which of course makes any special computation of the inverse unnecessary. This is a significant advantage in computations.
To aid our learning, let’s write a simple function that will report if a matrix is normal, orthogonal, or neither.^{8}
normal_or_orthogonal < function(M) {
if (!inherits(M, "matrix")) stop("M must be a matrix")
norm < orthog < FALSE
tst1 < M %*% t(M)
tst2 < t(M) %*% M
norm < isTRUE(all.equal(tst1, tst2))
if (norm) orthog < isTRUE(all.equal(tst1, diag(dim(M)[1])))
if (orthog) message("This matrix is orthogonal\n") else
if (norm) message("This matrix is normal\n") else
message("This matrix is neither orthogonal nor normal\n")
invisible(NULL)
}
And let’s run a couple of tests.
normal_or_orthogonal(A_singular)
This matrix is neither orthogonal nor normal
Norm < matrix(c(1, 0, 1, 1, 1, 0, 0, 1, 1), nrow = 3)
normal_or_orthogonal(Norm)
This matrix is normal
normal_or_orthogonal(diag(3)) # the identity matrix is orthogonal
This matrix is orthogonal
Orth < matrix(c(0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0), nrow = 4)
normal_or_orthogonal(Orth)
This matrix is orthogonal
The columns of an orthogonal matrix are orthogonal to each other. We can show this by taking the dot product between any pair of columns. Remember is the dot product is zero the vectors are orthogonal.
t(Orth[,1]) %*% Orth[,2] # col 1 dot col 2
[,1]
[1,] 0
t(Orth[,1]) %*% Orth[,3] # col 1 dot col 3
[,1]
[1,] 0
Finally, not only are the columns orthogonal, but each column vector has length one, making them orthonormal.
sqrt(sum(Orth[,1]^2))
[1] 1
Taking these queries together, we see that symmetric and diagonal matrices are necessarily invertible, diagonalizable and normal. They are not however orthogonal. Identity matrices however, have all these properties. Let’s doublecheck these statements.
A_sym < matrix(
c(1, 5, 4, 5, 2, 9, 4, 9, 3),
ncol = 3) # symmetric matrix, not diagonal
A_sym
[,1] [,2] [,3]
[1,] 1 5 4
[2,] 5 2 9
[3,] 4 9 3
normal_or_orthogonal(A_sym)
This matrix is normal
normal_or_orthogonal(diag(1:3)) # diagonal matrix, symmetric, but not the identity matrix
This matrix is normal
normal_or_orthogonal(diag(3)) # identity matrix (also symmetric, diagonal)
This matrix is orthogonal
So what’s the value of these queries? As mentioned, they help us understand the relationships between different types of matrices, so they help us learn more deeply. On a practical computational level they may not have much value, especially when dealing with realworld data sets. However, there are some other interesting aspects of these queries that deal with decompositions and eigenvalues. We might cover these in the future.
A more personal thought: In the course of writing these posts, and learning more linear algebra, it increasingly seems to me that a lot of the “effort” that goes into linear algebra is about making tedious operations simpler. Anytime one can have more zeros in a matrix, or have orthogonal vectors, or break a matrix into parts, the simpler things become. However, I haven’t really seen this point driven home in texts or tutorials. I think linear algebra learners would do well to keep this in mind.
These are the main sources I relied on for this post.
I’m only using a portion because the Hiranbe’s original contains a bit too much information for someone trying to get their footing in the field.↩︎
I’m using the term taxonomy a little loosely of course, you can call it whatever you want. The name is not so important really, what is important is the hierarchy of concepts.↩︎
As could complex numbers.↩︎
Usually in written text a row matrix, sometimes called a row vector, is written as . In order to save space in documents, rather than writing , a column matrix/vector can be kept to a single line by writing it as its transpose: , but this requires a little mental gymnastics to visualize.↩︎
Upper and lower triangular matrices play a special role in linear algebra. Because of the presence of many zeros, multiplying them and inverting them is relatively easy, because the zeros cause terms to drop out.↩︎
This idea of the “most natural basis” is most easily visualized in two dimensions. If you have some data plotted on and axes, determining the line of best fit is one way of finding the most natural basis for describing the data. However, more generally and in more dimensions, principal component analysis (PCA) is the most rigorous way of finding this natural basis, and PCA can be calculated with the eigen()
function. Lots more information here.↩︎
The drop
argument to subsetting/extracting defaults to TRUE
which means that if subsetting reduces the necessary number of dimensions, the unneeded dimension attributes are dropped. Under the default, selecting a single column of a matrix leads to a vector, not a one column vector. In this all.equal()
expression we need both sides to evaluate to a matrix.↩︎
One might ask why R
does not provide a userfacing version of such a function. I think a good argument can be made that the authors of R
passed down a robust and lean set of linear algebra functions, geared toward getting work done, and throwing errors as necessary.↩︎
@online{hanson2022,
author = {Hanson, Bryan},
title = {Notes on {Linear} {Algebra} {Part} 4},
date = {20220926},
url = {http://chemospec.org/posts/20221107AnnounceSubscribe/20220926LinearAlgNotesPt4/LinearAlgNotesPt4.html},
langid = {en}
}
Update 19 September 2022: in “Use of outer() for Matrix Multiplication”, corrected use of “cross” to be “outer” and added example in R
. Also added links to work by Hiranabe.
This post is a survey of the linear algebrarelated functions from base R
. Some of these I’ve disccused in other posts and some I may discuss in the future, but this post is primarily an inventory: these are the key tools we have available. “Notes” in the table are taken from the help files.
Matrices, including row and column vectors, will be shown in bold e.g. or while scalars and variables will be shown in script, e.g. . R
code will appear like x < y
.
In the table, or is an upper/right triangular matrix. is a lower/left triangular matrix (triangular matrices are square). is a generic matrix of dimensions . is a square matrix of dimensions .
Function  Uses  Notes  

operators  
* 
scalar multiplication  
%*% 
matrix multiplication  two vectors the dot product; vector + matrix cross product (vector will be promoted as needed)^{1}  
basic functions  
t() 
transpose  interchange rows and columns  
crossprod() 
matrix multiplication  faster version of t(A) %*% A 

tcrossprod() 
matrix multiplication  faster version of A %*% t(A) 

outer() 
outer product & more  see discussion below  
det() 
computes determinant  uses the LU decomposition; determinant is a volume  
isSymmetric() 
name says it all  
Conj() 
computes complex conjugate  
decompositions  
backsolve() 
solves  
forwardsolve() 
solves  
solve() 
solves and  e.g. linear systems; if given only one matrix returns the inverse  
qr() 
solves  is an orthogonal matrix; can be used to solve ; see ?qr for several qr.* extractor functions 

chol() 
solves  Only applies to positive semidefinite matrices (where ); related to LU decomposition  
chol2inv() 
computes from the results of chol(M) 

svd() 
singular value decomposition  input ; can compute PCA; details  
eigen() 
eigen decomposition  requires ; can compute PCA; details 
One thing to notice is that there is no LU decomposition in base R
. It is apparently used “under the hood” in solve()
and there are versions available in contributed packages.^{2}
As seen in Part 1 calling outer()
on two vectors does indeed give the cross product (technically corresponding to tcrossprod()
). This works because the defaults carry out multiplication.^{3} However, looking through the R
source code for uses of outer()
, the function should really be thought of in simple terms as creating all possible combinations of the two inputs. In that way it is similar to expand.grid()
. Here are two illustrations of the flexibility of outer()
:
# generate a grid of x,y values modified by a function
# from ?colorRamp
m < outer(1:20, 1:20, function(x,y) sin(sqrt(x*y)/3))
str(m)
num [1:20, 1:20] 0.327 0.454 0.546 0.618 0.678 ...
# generate all combinations of month and year
# modified from ?outer; any function accepting 2 args can be used
outer(month.abb, 2000:2002, FUN = paste)
[,1] [,2] [,3]
[1,] "Jan 2000" "Jan 2001" "Jan 2002"
[2,] "Feb 2000" "Feb 2001" "Feb 2002"
[3,] "Mar 2000" "Mar 2001" "Mar 2002"
[4,] "Apr 2000" "Apr 2001" "Apr 2002"
[5,] "May 2000" "May 2001" "May 2002"
[6,] "Jun 2000" "Jun 2001" "Jun 2002"
[7,] "Jul 2000" "Jul 2001" "Jul 2002"
[8,] "Aug 2000" "Aug 2001" "Aug 2002"
[9,] "Sep 2000" "Sep 2001" "Sep 2002"
[10,] "Oct 2000" "Oct 2001" "Oct 2002"
[11,] "Nov 2000" "Nov 2001" "Nov 2002"
[12,] "Dec 2000" "Dec 2001" "Dec 2002"
Bottom line: outer()
can be used for linear algebra but its main uses lie elsewhere. You don’t need it for linear algebra!
Here’s an interesting connection discussed in this Wikipedia entry. In Part 1 we demonstrated how the repeated application of the dot product underpins matrix multiplication. The first row of the first matrix is multiplied elementwise by the first column of the second matrix, shown in red, to give the first element of the answer matrix. This process is then repeated so that every row (first matrix) has been multiplied by every column (second matrix).
If instead, we treat the first column of the first matrix as a column vector and outer multiply it by the first row of the second matrix as a row vector, we get the following matrix:
Now if you repeat this process for the second column of the first matrix and the second row of the second matrix, you get another matrix. And if you do it one more time using the third column/third row, you get a third matrix. If you then add these three matrices together, you get as seen in Equation 1. Notice how each element in in Equation 1 is a sum of three terms? Each of those terms comes from one of the three matrices just described.
To sum up, one can use the dot product on each row (first matrix) by each column (second matrix) to get the answer, or you can use the outer product on the columns sequentially (first matrix) by rows sequentially (second matrix) to get several matrices, which one then sums to get the answer. It’s pretty clear which option is less work and easier to follow, but I think it’s an interesting connection between operations. The first case corresponds to view “MM1” in The Art of Linear Algebra while the second case is view “MM4”. See this work by Kenji Hiranabe.
Here’s a simple proof in R
.
M1 < matrix(1:6, nrow = 3, byrow = TRUE)
M1
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
M2 < matrix(7:10, nrow = 2, byrow = TRUE)
M2
[,1] [,2]
[1,] 7 8
[2,] 9 10
tst1 < M1 %*% M2 # uses dot product
# next line is sum of sequential outer products:
# 1st col M1 by 1st row M2 + 2nd col M1 by 2nd row M2
tst2 < outer(M1[,1], M2[1,]) + outer(M1[,2], M2[2,])
all.equal(tst1, tst2)
[1] TRUE
For details see the discussion in Part 1.↩︎
Discussed in this Stackoverflow question, which also has an implementation.↩︎
In fact, for the default outer()
, FUN = "*"
, outer()
actually calls tcrossprod()
.↩︎
@online{hanson2022,
author = {Hanson, Bryan},
title = {Notes on {Linear} {Algebra} {Part} 3},
date = {20220910},
url = {http://chemospec.org/posts/20220910LinearAlgNotesPt3/LinearAlgNotesPt3.html},
langid = {en}
}
For Part 1 of this series, see here.
If you open a linear algebra text, it’s quickly apparent how complex the field is. There are so many special types of matrices, so many different decompositions of matrices. Why are all these needed? Should I care about null spaces? What’s really important? What are the threads that tie the different concepts together? As someone who is trying to improve their understanding of the field, especially with regard to its applications in chemometrics, it can be a tough slog.
In this post I’m going to try to demonstrate how some simple chemometric tasks can be solved using linear algebra. Though I cover some math here, the math is secondary right now – the conceptual connections are more important. I’m more interested in finding (and sharing) a path through the thicket of linear algebra. We can return as needed to expand the basic math concepts. The cognitive effort to work through the math details is likely a lot lower if we have a sense of the big picture.
In this post, matrices, including row and column vectors, will be shown in bold e.g. while scalars and variables will be shown in script, e.g. . Variables used in R
code will appear like A
.
If you’ve had algebra, you have certainly run into “system of equations” such as the following:
In algebra, such systems can be solved several ways, for instance by isolating one or more variables and substituting, or geometrically (particularly for 2D systems, by plotting the lines and looking for the intersection). Once there are more than a few variables however, the only manageable way to solve them is with matrix operations, or more explicitly, linear algebra. This sort of problem is the core of linear algebra, and the reason the field is called linear algebra.
To solve the system above using linear algebra, we have to write it in the form of matrices and column vectors:
or more generally
where is the matrix of coefficients, is the column vector of variable names^{1} and is a column vector of constants. Notice that these matrices are conformable:^{2}
To solve such a system, when we have unknowns, we need equations.^{3} This means that has to be a square matrix, and square matrices play a special role in linear algebra. I’m not sure this point is always conveyed clearly when this material is introduced. In fact, it seems like many texts on linear algebra seem to bury the lede.
To find the values of ^{4}, we can do a little rearranging following the rules of linear algebra and matrix operations. First we premultiply both sides by the inverse of , which then gives us the identity matrix , which drops out.^{5}
So it’s all sounding pretty simple right? Ha. This is actually where things potentially break down. For this to work, must be invertible, which is not always the case.^{6} If there is no inverse, then the system of equations either has no solution or infinite solutions. So finding the inverse of a matrix, or discovering it doesn’t exist, is essential to solving these systems of linear equations.^{7} More on this eventually, but for now, we know must be a square matrix and we hope it is invertible.
We learn in algebra that a line takes the form . If one has measurements in the form of pairs that one expects to fit to a line, we need linear regression. Carrying out a linear regression is arguably one of the most important, and certainly a very common application of the linear systems described above. One can get the values of and by hand using algebra, but any computer will solve the system using a matrix approach.^{8} Consider this data:
To express this in a matrix form, we recast
into
where:
With our data above, this looks like:
If we multiply this out, each row works out to be an instance of . Hopefully you can appreciate that corresponds to and corresponds to .^{9}
This looks similar to seen in Equation 3, if you set to , to and to :
This contortion of symbols is pretty nasty, but honestly not uncommon when moving about in the world of linear algebra.
As it is composed of real data, presumably with measurement errors, there is not an exact solution to due to the error term. There is however, an approximate solution, which is what is meant when we say we are looking for the line of best fit. This is how linear regression is carried out on a computer. The relevant equation is:
The key point here is that once again we need to invert a matrix to solve this. The details of where Equation 11 comes from are covered in a number of places, but I will note here that refers to the best estimate of .^{10}
We now have two examples where inverting a matrix is a key step: solving a system of linear equations, and approximating the solution to a system of linear equations (the regression case). These cases are not outliers, the ability to invert a matrix is very important. So how do we do this? The LU decomposition can do it, and is widely used so worth spending some time on. A decomposition is the process of breaking a matrix into pieces that are easier to handle, or that give us special insight, or both. If you are a chemometrician you have almost certainly carried out Principal Components Analysis (PCA). Under the hood, PCA requires either a singular value decomposition, or an eigen decomposition (more info here).
So, about the LU decomposition: it breaks a matrix into two matrices, , a “lower triangular matrix”, and , an “upper triangular matrix”. These special matrices contain only zeros except along the diagonal and the entries below it (in the lower case), or along the diagonal and the entries above it (in the upper case). The advantage of triangular matrices is that they are very easy to invert (all those zeros make many terms drop out). So the LU decomposition breaks the tough job of inverting into two easier jobs.
When all is done, we only need to figure out and which as mentioned is straightforward.^{11}
To summarize, if we want to solve a system of equations we need to carry out matrix inversion, which is turn is much easier to do if one uses the LU decomposition to get two easy to invert triangular matrices. I hope you are beginning to see how pieces of linear algebra fit together, and why it might be good to learn more.
Let’s look at how R
does these operations, and check our understanding along the way. R
makes this really easy. We’ll start with the issue of invertibility. Let’s create a matrix for testing.
A1 < matrix(c(3, 5, 1, 11, 2, 0, 5, 2, 5), ncol = 3)
A1
[,1] [,2] [,3]
[1,] 3 11 5
[2,] 5 2 2
[3,] 1 0 5
In the matlib
package there is a function inv
that inverts matrices. It returns the inverted matrix, which we can verify by multiplying the inverted matrix by the original matrix to give the identity matrix (if inversion was successful). diag(3)
creates a 3 x 3 matrix with 1’s on the diagonal, in other words an identity matrix.
library("matlib")
A1_inv < inv(A1)
all.equal(A1_inv %*% A1, diag(3))
[1] "Mean relative difference: 8.999999e08"
The difference here is really small, but not zero. Let’s use a different function, solve
which is part of base R
. If solve
is given a single matrix, it returns the inverse of that matrix.
A1_solve < solve(A1) %*% A1
all.equal(A1_solve, diag(3))
[1] TRUE
That’s a better result. Why are there differences? inv
uses a method called Gaussian elimination which is similar to how one would invert a matrix using pencil and paper. On the other hand, solve
uses the LU decomposition discussed earlier, and no matrix inversion is necessary. Looks like the LU decomposition gives a somewhat better numerical result.
Now let’s look at a different matrix, created by replacing the third column of A1
with different values.
A2 < matrix(c(3, 5, 1, 11, 2, 0, 6, 10, 2), ncol = 3)
A2
[,1] [,2] [,3]
[1,] 3 11 6
[2,] 5 2 10
[3,] 1 0 2
And let’s compute its inverse using solve
.
solve(A2)
Error in solve.default(A2): system is computationally singular: reciprocal condition number = 6.71337e19
When R
reports that A2
is computationally singular, it is saying that it cannot be inverted. Why not? If you look at A2
, notice that column 3 is a multiple of column 1. Anytime one column is a multiple of another, or one row is a multiple of another, then the matrix cannot be inverted because the rows or columns are not independent.^{12} If this was a matrix of coefficients from an experimental measurement of variables, this would mean that some of your variables are not independent, they must be measuring the same underlying phenomenon.
Let’s solve the system from Equation 2. It turns out that the solve
function also handles this case, if you give it two arguments. Remember, solve
is using the LU decomposition behind the scenes, no matrix inversion is required.
A3 < matrix(c(1, 2, 3, 2, 1, 2, 3, 1, 1), ncol = 3)
A3
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 1 1
[3,] 3 2 1
colnames(A3) <c("x", "y", "z") # naming the columns will label the answer
b < c(3, 11, 5)
solve(A3, b)
x y z
2 4 3
The answer is the values of that make the system of equations true.
While we’ve emphasized the importance and challenges of inverting matrices, we’ve also pointed out that to solve a linear system there are alternatives to looking at the problem from the perspective of Equation 5. Here’s an approach using the LU decomposition, starting with substituting with :
We want to solve for the column vector of variables. To do so, define a new vector and substitute it in:
Next we solve for . One way we could do this is to premultiply both sides by but we are looking for a way to avoid using the inverse. Instead, we evaluate to give a series of expressions using the dot product (in other words plain matrix multiplication). Because is lower triangular, many of the terms we might have gotten actually disappear because of the zero coefficients. What remains is simple enough that we can algebraically find each element of starting from the first row (this is called forward substitution). Once we have , we can find by solving using a similar approach, but working from the last row upward (this is backward substitution). This is a good illustration of the utility of triangular matrices: some operations can move from the linear algebra realm to the algebra realm. Wikipedia has a good illustration of forward and backward substitution.
Let’s compute the values for in our regression data shown in Equation 6. First, let’s set up the needed matrices and plot the data since visualizing the data is always a good idea.
y = matrix(c(11.8, 7.2, 21.5, 17.2, 26.8), ncol = 1)
X = matrix(c(rep(1, 5), 2.1, 0.9, 3.9, 3.2, 5.1), ncol = 2) # design matrix
X
[,1] [,2]
[1,] 1 2.1
[2,] 1 0.9
[3,] 1 3.9
[4,] 1 3.2
[5,] 1 5.1
plot(X[,2], y, xlab = "x") # column 2 of X has the x values
The value of can be found via Equation 11:
solve((t(X) %*% X)) %*% t(X) %*% y
[,1]
[1,] 2.399618
[2,] 4.769862
The first value is for or or intecept, the second value is for or or slope.
Let’s compare this answer to R
’s builtin lm
function (for linear model):
fit < lm(y ~ X[,2])
fit
Call:
lm(formula = y ~ X[, 2])
Coefficients:
(Intercept) X[, 2]
2.40 4.77
We have good agreement! If you care to learn about the goodness of the fit, the residuals etc, then you can look at the help file ?lm
and str(fit)
. lm
returns pretty much all one needs to know about the results, but if you wish to calculate all the interesting values yourself you can do so by manipulating Equation 11 and its relatives.
Finally, let’s plot the line of best fit found by lm
to make sure everything looks reasonable.
plot(X[,2], y, xlab = "x")
abline(coef = coef(fit), col = "red")
That’s all for now, and a lot to digest. I hope you are closer to finding your own path through linear algebra. Remember that investing in learning the fundamentals prepares you for tackling the more complex topics. Thanks for reading!
These are the main sources I relied on for this post.
matlib
package are very helpful.Here we have the slightly unfortunate circumstance where symbol conventions cannot be completely harmonized. We are saying that which seems a bit silly since vector contains and components in addition to . I ask you to accept this for two reasons: First, most linear algebra texts use the symbols in Equation 3 as the general form for this topic, so if you go to study this further that’s what you’ll find. Second, I feel like using , and in Equation 1 will be familar to the most people. If you want to get rid of this infelicity, then you have to write Equation 1 (in part) as which I think clouds the interpretation. Perhaps however you feel my choices are equally bad.↩︎
Conformable means that the number of columns in the first matrix equals the number of rows in the second matrix. This is necessary because of the dot product definition of matrix multiplication. More details here.↩︎
Remember “story problems” where you had to read closely to express what was given in terms of equations, and find enough equations? “If Sally bought 10 pieces of candy and a drink for $1.50…”↩︎
We could also write this as to emphasize that it is a column vector. One might prefer this because the only vector one can write in a row of text is a row vector, so if we mean a column vector many people would prefer to write it transposed.↩︎
The inverse of a matrix is analogous to dividing a variable by itself, since it leads to that variable canceling out and thus simplifying the equation. However, strictly speaking there is no operation that qualifies as division in the matrix world.↩︎
For a matrix to be invertible, there must exist another matrix such that . However, this definition doesn’t offer any clues about how we might find the inverse.↩︎
In truth, there are other ways to solve that don’t require inversion of a matrix. However, if a matrix isn’t invertible, these other methods will also break down. We’ll demonstrate this later when we talk about the LU decomposition.↩︎
A very good discussion of the algebraic approach is available here.↩︎
This is another example of an infelicity of symbol conventions. The typical math/statistics text symbols are not the same as the symbols a student in Physics 101 would likely encounter.↩︎
The careful reader will note that the data set shown in Equation 9 is not square, there are more observations (rows) than variables (columns). This is fine and desireable for a linear regression, we don’t want to use just two data points as that would have no error but not necessarily be accurate. However, only square matrices have inverses, so what’s going on here? In practice, what’s happening is we are using something called a pseudoinverse. The first part of the right side of Equation 11 is in fact the pseudoinverse: . Perhaps we’ll cover this in a future post.↩︎
The switch in the order of matrices on the last line of Equation 12 is one of the properties of the inverse operator.↩︎
This means that the rank of the matrix is less than the number of columns. You can get the rank of a matrix by counting the number of nonzero eigenvalues via eigen(A2)$values
, which in this case gives 8.9330344, 5.9330344, 3.5953271^{16}. There are only two nonzero values, so the rank is two. Perhaps in another post we’ll discuss this in more detail.↩︎
@online{hanson2022,
author = {Hanson, Bryan},
title = {Notes on {Linear} {Algebra} {Part} 2},
date = {20220901},
url = {http://chemospec.org/posts/20220901LinearAlgNotesPt2/LinearAlgNotesPt2.html},
langid = {en}
}
R
, read no further and do something else!
If you are like me, you’ve had no formal training in linear algebra, which means you learn what you need to when you need to use it. Eventually, you cobble together some hardwon knowledge. That’s good, because almost everything in chemometrics involves linear algebra.
This post is essentially a set of personal notes about the dot product and the cross product, two important manipulations in linear algebra. I’ve tried to harmonize things I learned way back in college physics and math courses, and integrate information I’ve found in various sources I have leaned on more recently. Without a doubt, the greatest impediment to really understanding this material is the use of multiple terminology and notations. I’m going to try really hard to be clear and to the point in my dicussion.
The main sources I’ve relied on are:
Let’s get started. For sanity and consistency, let’s define two 3D vectors and two matrices to illustrate our examples. Most of the time I’m going to write vectors with an arrow over the name, as a nod to the treatment usually given in a physics course. This reminds us that we are thinking about a quantity with direction and magnitude in some coordinate system, something geometric. Of course in the R
language a vector is simply a list of numbers with the same data type; R
doesn’t care if a vector is a vector in the geometric sense or a list of states.
The dot product goes by these other names: inner product, scalar product. Typical notations include:^{1}
There are two main formulas for the dot product with vectors, the algebraic formula (Equation 5) and the geometric formula (Equation 6).
refers to the or Euclidian norm, namely the length of the vector:^{2}
The result of the dot product is a scalar. The dot product is also commutative: .
From the perspective of matrices, if we think of and as column vectors with dimensions 3 x 1, then transposing gives us conformable matrices and we find the result of matrix multiplication is the dot product (compare to Equation 5):
Even though this is matrix multiplication, the answer is still a scalar.
Now, rather confusingly, if we think of and as row vectors, and we transpose ,then we get the dot product:
Equations Equation 8 and Equation 9 can be a source of real confusion at first. They give the impression that the dot product can be either or . However, this is only true in the limited contexts defined above. To summarize:
Unfortunately I think this distinction is not always clearly made by authors, and is a source of great confusion to linear algebra learners. Be careful when working with row and column vectors.
Suppose we wanted to compute .^{3} We use the idea of row and column vectors to accomplish this task. In the process, we discover that matrix multiplication is a series of dot products:
The red color shows how the dot product of the first row of and the first column of gives the first entry in . Every entry in results from a dot product. Every entry is a scalar, embedded in a matrix.
The cross product goes by these other names: outer product^{4}, tensor product, vector product.
The cross product of two vectors returns a vector rather than a scalar. Vectors are defined in terms of a basis which is a coordinate system. Earlier, when we defined it was intrinsically defined in terms of the standard basis set (in some fields this would be called the unit coordinate system). Thus a fuller definition of would be:
In terms of vectors, the cross product is defined as:
In my opinion, this is not exactly intuitive, but there is a pattern to it: notice that the terms for don’t involve the component. The details of how this result is computed relies on some properties of the basis set; this Wikipedia article has a nice explanation. We need not dwell on it however.
There is also a geometric formula for the cross product:
where is the unit vector perpendicular to the plane defined by and . The direction of is defined by the righthand rule. Because of this, the cross product is not commutative, i.e. . The cross product is however anticommutative:
As we did for the dot product, we can look at the cross product from the perspective of column vectors. Instead of transposing the first matrix as we did for the dot product, we transpose the second one:
Interestingly, we are using the dot product to compute the cross product.
The case where we treat and as row vectors is left to the reader.^{5}
Finally, there is a matrix definition of the cross product as well. Evaluation of the following determinant gives the cross product:
%*%
The workhorse for matrix multiplication in R
is the %*%
function. This function will accept any combination of vectors and matrices as inputs, so it is flexible. It is also smart: given a vector and a matrix, the vector will be treated as row or column matrix as needed to ensure conformity, if possible. Let’s look at some examples:
# Some data for examples
p < 1:5
q < 6:10
M < matrix(1:15, nrow = 3, ncol = 5)
M
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
# A vector times a vector
p %*% q
[,1]
[1,] 130
Notice that R
returns a data type of matrix, but it is a matrix, and thus a scalar value. That means we just computed the dot product, a descision R
made internally. We can verify this by noting that q %*% p
gives the same answer. Thus, R
handled these vectors as column vectors and computed .
# A vector times a matrix
M %*% p
[,1]
[1,] 135
[2,] 150
[3,] 165
As M
had dimensions , R
treated p
as a column vector in order to be conformable. The result is a vector, so this is the cross product.
If we try to compute p %*% M
we get an error, because there is nothing R
can do to p
which will make it conformable to M
.
p %*% M
Error in p %*% M: nonconformable arguments
What about multiplying matrices?
M %*% M
Error in M %*% M: nonconformable arguments
As you can see, when dealing with matrices, %*%
will not change a thing, and if your matrices are nonconformable then it’s an error. Of course, if we transpose either instance of M
we do have conformable matrices, but the answers are different, and this is neither the dot product or the cross product, just matrix multiplication.
t(M) %*% M
[,1] [,2] [,3] [,4] [,5]
[1,] 14 32 50 68 86
[2,] 32 77 122 167 212
[3,] 50 122 194 266 338
[4,] 68 167 266 365 464
[5,] 86 212 338 464 590
M %*% t(M)
[,1] [,2] [,3]
[1,] 335 370 405
[2,] 370 410 450
[3,] 405 450 495
What can we take from these examples?
R
will give you the dot product if you give it two vectors. Note that this is a design decision, as it could have returned the cross product (see Equation 14).R
will promote a vector to a row or column vector if it can to make it conformable with a matrix you provide. If it cannot, R
will give you an error. If it can, the cross product is returned.R
will give an error when they are not conformable.%*%
, does it all: dot product, cross product, or matrix multiplication, but you need to pay attention.There are other R
functions that do some of the same work:
crossprod
equivalent to t(M) %*% M
but faster.tcrossprod
equivalent to M %*% t(M)
but faster.outer
or %o%
The first two functions will accept combinations of vectors and matrices, as does %*%
. Let’s try it with two vectors:
crossprod(p, q)
[,1]
[1,] 130
Huh. crossprod
is returning the dot product! So this is the case where “the cross product is not the cross product.” From a clarity perspective, this is not ideal. Let’s try the other function:
tcrossprod(p, q)
[,1] [,2] [,3] [,4] [,5]
[1,] 6 7 8 9 10
[2,] 12 14 16 18 20
[3,] 18 21 24 27 30
[4,] 24 28 32 36 40
[5,] 30 35 40 45 50
There’s the cross product!
What about outer
? Remember that another name for the cross product is the outer product. So is outer
the same as tcrossprod
? In the case of two vectors, it is:
identical(outer(p, q), tcrossprod(p, q))
[1] TRUE
What about a vector with a matrix?
tst < outer(p, M)
dim(tst)
[1] 5 3 5
Alright, that clearly is not a cross product. The result is an array with dimensions , not a matrix (which would have only two dimensions). outer
does correspond to the cross product in the case of two vectors, but anything with higher dimensions gives a different beast. So perhaps using “outer” as a synonym for cross product is not a good idea.
Given what we’ve seen above, make your life simple and stick to %*%
, and pay close attention to the dimensions of the arguments, especially if row or column vectors are in use. In my experience, thinking about the units and dimensions of whatever it is you are calculating is very helpful. Later, if speed is really important in your work, you can use one of the faster alternatives.
An extensive dicussion of notations can be found here.↩︎
And curiously, the norm works out to be equal to the square root of the dot product of a vector with itself: ↩︎
To be multiplied, matrices must be conformable, namely the number of columns of the first matrix must match the number of rows of the second matrix. The reason is so that the dot product terms will match. In the present case we have .↩︎
Be careful, it turns out that “outer” may not be a great synonym for cross product, as explained later.↩︎
OK fine, here is the answer when treating and as row vectors: which expands exactly as the righthand side of Equation 14.↩︎
@online{hanson2022,
author = {Hanson, Bryan},
title = {Notes on {Linear} {Algebra} {Part} 1},
date = {20220814},
url = {http://chemospec.org/posts/20220814LinearAlgNotes/20220814LinearAlgNotes.html},
langid = {en}
}
A few days ago I pushed a major update, and at this point Python
packages outnumber R
packages more than two to one. The update was made possible because I recently had time to figure out how to search the PyPi.org site automatically.
In a previous post I explained the methods I used to find packages related to spectroscopy. These have been updated considerably and the rest of this post will cover the updated methods.
There are four places I search for packages related to spectroscopy.^{1}
packagefinder
package.^{2}The topics I search are as follows:
I search CRAN using packagefinder
; the process is quite straightforward and won’t be covered here. However, it is not an automated process (I should probably work on that).
The broad approach used to search Github is the same as described in the original post. However, the scripts have been refined and updated, and now exist as functions in a new package I created called webu
(for “webutilities”, but that name is taken on CRAN). The repo is here. webu
is not on CRAN and I don’t currently intend to put it there, but you can install from the repo of course if you wish to try it out.
Searching Github is now carried out by a supervising script called /utilities/run_searches.R
(in the FOSS4Spectroscopy
repo). The script contains some notes about finicky details, but is pretty simple overall and should be easy enough to follow.
Unlike Github, it is not necessary to authenticate to use the PyPi.org API. That makes things simpler than the Github case. The needed functions are in webu
and include some deliberate delays so as to not overload their servers. As for Github, searches are supervised by /utilities/run_searches.R
.
One thing I observed at PyPi.org is that authors do not always fill out all the fields that PyPi.org can accept, which means some fields are NULL
and we have to trap for that possibility. Package information is accessed via a JSON record, for instance the entry for nmrglue
can be seen here. This package is pretty typical in that the author_email
field is filled out, but the maintainer_email
field is not (they are presumably the same). If one considers these JSON files to be analogous to DESCRIPTION in R
packages, it looks like there is less oversight on PyPi.org compared to CRAN.
Julia packages are readily searched manually at juliapackages.org.
The raw results from the searches described above still need a lot of inspection and cleaning to be usable. The PyPi.org and Github results are saved in an Excel worksheet with the relevant URLs. These links can be followed to determine the suitability of each package. In the /Utilities
folder there are additional scripts to remove entries that are already in the main database (FOSS4Spec.xlsx), as well as to check the names of the packages: Python authors and/or policies seem to lead to cases where different packages can have names differing by case, but also authors are sometimes sloppy when referring to their own packages, sometimes using mypkg
and at other times myPkg
to refer to the same package.
Once in a while users submit their own package to the repo, and I also find interesting packages in my literature reading.↩︎
packagefinder
has recently been archived, but hopefully will be back soon.↩︎
@online{hanson2022,
author = {Hanson, Bryan},
title = {FOSS4Spectroscopy: {R} Vs {Python}},
date = {20220706},
url = {http://chemospec.org/posts/20220706F4SUpdate/20220706F4SUpdate.html},
langid = {en}
}
I’m pleased to announce that my colleague David Harvey and I have recently released LearnPCA
, an R
package to help people with understanding PCA. In LearnPCA
we’ve tried to integrate our years of experience teaching the topic, along with the best insights we can find in books, tutorials and the nooks and crannies of the internet. Though our experience is in a chemometrics context, we use examples from different disciplines so that the package will be broadly helpful.
The package contains seven vignettes that proceed from the conceptual basics to advanced topics. As of version 0.2.0, there is also a Shiny app to help visualize the process of finding the principal component axes. The current vignettes are:
You can access the vignettes at the Github Site, you don’t even have to install the package. For the Shiny app, do the following:
install.packages("LearnPCA") # you'll need version 0.2.0
library("LearnPCA")
PCsearch()
We would really appreciate your feedback on this package. You can do so in the comments below, or open an issue.
@online{hanson2022,
author = {Hanson, Bryan},
title = {Introducing {LearnPCA}},
date = {20220503},
url = {http://chemospec.org/posts/20220503LearnPCAIntro/20220503LearnPCAIntro.html},
langid = {en}
}
If you aren’t familiar with ChemoSpec
, you might wish to look at the introductory vignette first.
In this series of posts we are following the protocol as described in the printed publication closely (Blaise et al. 2021). The authors have also provided a Jupyter notebook. This is well worth your time, even if Python is not your preferred language, as there are additional examples and discussion for study.
Load the Spectra
object we created in Part 2 so we can summarize it.
library("ChemoSpec")
load("Worms2.RData") # restores the 'Worms2' Spectra object
sumSpectra(Worms2)
C. elegans metabolic phenotyping study (Blaise 2007)
There are 133 spectra in this set.
The yaxis unit is intensity.
The frequency scale runs from
8.9995 to 5e04 ppm
There are 8600 frequency values.
The frequency resolution is
0.001 ppm/point.
This data set is not continuous
along the frequency axis.
Here are the data chunks:
beg.freq end.freq size beg.indx end.indx
1 8.9995 5.0005 3.999 1 4000
2 4.5995 0.0005 4.599 4001 8600
The spectra are divided into 4 groups:
group no. color symbol alt.sym
1 Mut_L2 28 #FB0D16FF 0 m2
2 Mut_L4 33 #FFC0CBFF 15 m4
3 WT_L2 32 #511CFCFF 1 w2
4 WT_L4 40 #2E94E9FF 16 w4
*** Note: this is an S3 object
of class 'Spectra'
If you recall in Part 2 we removed five samples. Let’s rerun PCA without these samples and show the key plots. We will simply report these here without much discussion; they are pretty much as expected.
c_pca < c_pcaSpectra(Worms2, choice = "autoscale")
plotScree(c_pca)
p < plotScores(Worms2, c_pca, pcs = 1:2, ellipse = "rob", tol = 0.02)
p
p < plotScores(Worms2, c_pca, pcs = 2:3, ellipse = "rob", leg.loc = "bottomleft",
tol = 0.02)
p
One thing the published protocol does not explicitly discuss is an inspection of the loadings, but it is covered in the Jupyter notebook. The loadings are useful in order to see if any particular frequencies are driving the separation of the samples in the score plot. Let’s plot the loadings (Figure 4). Remember that these data were autoscaled, and hence all frequencies, including noisy frequencies, will contribute to the separation. If we had not scaled the data, these plots would look dramatically different.
p < plotLoadings(Worms2, c_pca, loads = 1:2)
p
The splot is another very useful way to find peaks that are important in separating the samples (Figure 5); we can see that the peaks around 1.301.32, 1.471.48, and 3.033.07 are important drivers of the separation in the score plot. Having discovered this, one can investigate the source of those peaks.
p < sPlotSpectra(Worms2, c_pca, tol = 0.001)
p
ChemoSpec
carries out exploratory data analysis, which is an unsupervised process. The next step in the protocol is PLSDA (partial least squares  discriminant analysis). I have written about ChemoSpec
+ PLS here if you would like more background on plain PLS. However, PLSDA is a technique that combines data reduction/variable selection along with classification. We’ll need the mixOmics
package (F et al. (2017)) package for this analysis; note that loading it replaces the plotLoadings
function from ChemoSpec
.
library("mixOmics")
Loading required package: MASS
Loading required package: lattice
Loaded mixOmics 6.20.0
Thank you for using mixOmics!
Tutorials: http://mixomics.org
Bookdown vignette: https://mixomicsteam.github.io/Bookdown
Questions, issues: Follow the prompts at http://mixomics.org/contactus
Cite us: citation('mixOmics')
Attaching package: 'mixOmics'
The following object is masked from 'package:ChemoSpec':
plotLoadings
Figure 6 shows the score plot; the results suggest that classification and modeling may be successful. The splsda
function carries out a single sparse computation. One computation should not be considered the ideal answer; a better approach is to use crossvalidation, for instance the bootsPLS
function in the bootsPLS
package (Rohart, Le Cao, and Wells (2018) which uses splsda
under the hood). However, that computation is too timeconsuming to demonstrate here.
X < Worms2$data
Y < Worms2$groups
splsda < splsda(X, Y, ncomp = 8)
plotIndiv(splsda,
col.per.group = c("#FB0D16FF", "#FFC0CBFF", "#511CFCFF", "#2E94E9FF"),
title = "sPLSDA Score Plot", legend = TRUE, ellipse = TRUE)
To estimate the number of components needed, the perf
function can be used. The results are in Figure 7 and suggest that five components are sufficient to describe the data.
perf.splsda < perf(splsda, folds = 5, nrepeat = 5)
plot(perf.splsda)
At this point, we have several ideas of how to proceed. Going forward, one might choose to focus on accurate classification, or on determining which frequencies should be included in a predictive model. Any model will need to refined and more details extracted. The reader is referred to the case study from the mixOmics folks which covers these tasks and explains the process.
This post was created using ChemoSpec
version 6.1.3 and ChemoSpecUtils
version 1.0.0.
@online{hanson2022,
author = {Hanson, Bryan},
title = {Metabolic {Phenotyping} {Protocol} {Part} 3},
date = {20220501},
url = {http://chemospec.org/posts/20220501ProtocolPt3/20220501ProtocolPt3.html},
langid = {en}
}