CLI Tutorial

In this tutorial, we’ll look at how to use the CLI interface and config files to produce a sensitivity estimate for the Phase 2 MWA array.

We will assume throughout that you are working in a directory containing the following files:

$ ls
mwa_phase2_compact_antpos.txt  observation_mwa.yml  observatory_mwa.yml  sensitivity_mwa.yml

These are all the configuration and data files we will need to perform the sensitivity estimate. You can obtain them here.

Antenna Positions

In general you do not need a file of antenna positions – you could alternatively specify a function to generate them (eg. see the default example observatory config). Typically, however, you will have a given list of positions. These can be in the format of a .npy array, a pickle file, or a simple ASCII file. In any case, the array must be of shape (Nant, 3), where the three columns are East, North and Elevation, all in units of metres (actually, if its a pickle file, the array could be an astropy Quantity, and specify its own units of length). Of course, East-North-Up (ENU) are specified relative to some assumed array centre location.

The last ten lines of our MWA tile positions look like:

$ tail mwa_phase2_compact_antpos.txt
-2.971999999999999975e+00 1.220580000000000069e+02 3.763199999999999932e+02
-1.698900000000000077e+01 1.220699999999999932e+02 3.764599999999999795e+02
3.995999999999999996e+00 1.099509999999999934e+02 3.762200000000000273e+02
-9.980000000000000426e+00 1.099440000000000026e+02 3.763299999999999841e+02
-2.399399999999999977e+01 1.099369999999999976e+02 3.764900000000000091e+02
5.301899999999999835e+01 1.220520000000000067e+02 3.758100000000000023e+02
-1.698699999999999832e+01 9.781000000000000227e+01 3.763600000000000136e+02
4.602199999999999847e+01 1.099489999999999981e+02 3.758100000000000023e+02
3.202400000000000091e+01 1.099440000000000026e+02 3.759599999999999795e+02
1.802499999999999858e+01 1.099590000000000032e+02 3.760699999999999932e+02

Observatory

The observatory configuration specifies attributes of your observatory, like the antenna positions and the beam. All possible parameters are described here Our observatory configuration looks like:

$ cat observatory_mwa.yml
antpos: !txt mwa_phase2_compact_antpos.txt
beam:
    class: GaussianBeam
    frequency: !astropy.units.Quantity
        unit: !astropy.units.Unit {unit: MHz}
        value: 150
    dish_size: !astropy.units.Quantity
        unit: !astropy.units.Unit {unit: m}
        value: 4.0
latitude: !astropy.units.Quantity
    unit: !astropy.units.Unit {unit: rad}
    value: -0.4660608447416767  # In radians
Trcv: !astropy.units.Quantity
    unit: !astropy.units.Unit {unit: mK}
    value: 100000.0

Notice that antpos just points to our positions file. Its location is assumed to be relative to the location of the configuration file, unless it is an absolute path. The YAML-tag “!txt” tells it to load the file as a standard ASCII txt file.

The beam is specified by a dictionary. The “class” tells it what kind of beam to use (at this point, only the GaussianBeam is provided), while the frequency is specified in MHz and the dish_size in metres. Note these are set using the standard astropy YAML-tags to create units. You must specify units for all quantities.

In 21cmSense, all calculations are performed at a single frequency – expected to be the central frequency of the observation. It is expected that the bandwidth (to be discussed later) is small enough around this frequency that the cosmic signal is not evolving significantly over the band.

The dish_size sets a number of physical limits in the calculation. It sets the size of the beam, which in turn sets the UV-plane resolution, and the duration of a coherent LST-bin (to be discussed further below).

The observatory also requires a latitude. This is required to determine how sky rotation affects the modes that are observed by a given baseline over the course of a night. We do not need a longitude without loss of generality.

Finally, Trcv sets the receiver temperature, in mK. Typically, this is not highly influential, since the sky temperature is the dominant source of thermal noise.

Observation

Now that we’ve specified the attributes of the observatory, we can go on to specifying attributes of our particular set of observations. All possible options are described here. Here’s what the config looks like:

$ cat observation_mwa.yml
observatory: observatory_mwa.yml
integration_time: !astropy.units.Quantity
    unit: !astropy.units.Unit {unit: s}
    value: 60.0
time_per_day: !astropy.units.Quantity
    unit: !astropy.units.Unit {unit: hour}
    value: 7
n_days: 180
n_channels: 100
bandwidth: !astropy.units.Quantity
    unit: !astropy.units.Unit {unit: MHz}
    value: 10
coherent: true
bl_max: !astropy.units.Quantity
    unit: !astropy.units.Unit {unit: m}
    value: 100.0

The first thing to note here is that we’re specifying the observatory that we looked at in the previous section. In fact, the observatory could have been fully specified directly in this file, by making it a dictionary (with all the key: value pairs from our observatory_mwa.yml file).

The next three parameters all have to do with how much time we are observing for. There is a good reason they are split into three parts. The integration_time specifies the number of seconds that the telescope integrates observations to yield a single snapshot. All data within this time frame is averaged coherently (per baseline) whether that’s a good idea or not. In addition, observations falling within a single UV cell and within an LST-bin are averaged together coherently. By default, an LST-bin is considered to be the length of time it takes for a point on the sky to travel through the FWHM of the beam. This is an approximation. You can set the length of the LST-bin directly in the observation.yml by setting observation_duration (in minutes). The interpretation is that within this time-frame, the sky has not changed significantly, and that if a baseline stays in the same UV cell (remember, its UV co-ordinate changes with time), it should be averaged coherently. The time_per_day then effectively gives the number of such LST-bins that are observed during the night. Different LST-bins are averaged incoherently, even in the same UV cell. Finally, n_days gives the total number of days for which these kinds of observations are averaged. Each day, the same sky appears again, and these are assumed to be able to be added coherently.

Thus, the integration time for a given UV-cell from a single baseline is approximately

\[n_{\rm days} * \sqrt{\frac{\rm hours}{\rm day} \frac{1}{t_{\rm LST}}}.\]

In detail, this is modified slightly by rotation of the sky within an LST-bin, and how finely that duration is sampled (i.e. integration_time), but these are second-order effects, and are baseline-dependent.

The number of channels and bandwidth purely affect the range of parallel $k$-modes probed. We note again that the bandwidth here is not meant to be the entire bandwidth of the instrument (hence it is not included in the Observatory), rather it is the bandwidth over which the cosmic signal is relatively stationary. Smaller bandwidths lead to fewer low-$k$ modes observed, while smaller number of channels lead to fewer high-$k$ modes. This can affect overall sensitivity, but typically not dramatically (and not for a particular $k$-mode).

The coherent parameter specifies whether different baselines, if they fall into the same UV cell in the same LST-bin, should be averaged coherently. This can have quite an impact for certain layouts. Note that baselines that are considered redundant, i.e. they have the same vector to within some user-specified tolerance, are always averaged coherently.

Finally, the bl_max parameter specifies the maximum baseline length to include in the analysis, in metres. We reduce this to 100 since longer baselines are more prone to systematics, and do not add a great deal of sensitivity.

Gridding Baselines

Algorithmically, the first thing to do is to grid the baselines onto the UV plane. You do not have to do this manually, but it can be useful to do so, in order to create an intermediate product that can be investigated and re-used in further calculations.

Let’s do this:

$ sense grid-baselines observation_mwa.yml
finding redundancies: 100%|███████████████████████████████████████████████████████| 127/127 [00:00<00:00, 408.55ants/s]
computing UVWs: 100%|█████████████████████████████████████████████████████████████| 118/118 [00:03<00:00, 38.14times/s]
gridding baselines: 100%|████████████████████████████████████████████████| 2586/2586 [00:00<00:00, 10307.26baselines/s]
There are 2586 baseline types
Saving array file as ./blmin0_blmax100_0.155GHz_observation.h5

As we can see, the code first finds baseline redundancies, up to the default tolerance. Doing this mostly acts to improve performance in the following baseline gridding, especially for highly redundant arrays.

Following this, the code grids the baselines. Essentially, it determines the UV-coordinate of each baseline for each integration time within an LST (the centre of each LST bin has the array phased to zenith, and it tracks around this point throughout the bin). It then adds the number of redundant baselines in that group to that particular UV cell.

The important output information here is the array file, which we will have to use in our sensitivity analysis. This file is in fact a serialized version of the entire Observation class, and can be loaded into a python interpreter using hickle, and its data can be read using h5py directly. Essentially, it is just the UV grid.

Sensitivity

The final configuration file required is sensitivity_mwa.yml. Let’s look at this:

$ cat sensitivity_mwa.yml
observation: observation_mwa.yml
horizon_buffer: !astropy.units.Quantity
    unit: !astropy.units.Unit {unit: 1/Mpc}
    value: 0.1
theory_model: EOS2021

Here the observation is of course the previously-specified file. Again, we could have specified the observation directly in this file, but it helps to separate it in order to run the gridding separately.

Note

The observation entry to the YAML here is only used if calc-sense is called without specifying the --array-file. If the --array-file is specified it fully defines the observation. If calc-sense is called without --array-file, then the full sensitivity calculation is done in one hit, without writing the intermediate array file, using the given ``observation: `` parameter.

The horizon_buffer specifies a region of kparallel which gets thrown out due to assumed high level of foregrounds, if foreground_model is moderate (which is the default). This is in addition to the horizon line. For small baselines, this effectively sets a “bar” below which all $k_{||}$ are thrown out. Its units are h/Mpc.

Finally, theory_model specifies a theoretical model to use in order to calculate the sample (cosmic) variance. The default value is to use EOS2021, which is specified over a broad range of redshifts and scales. This can be changed to your own theoretical model quite simply (see the FAQs for more information).

Now we run the sensitivity analysis:

$ sense calc-sense sensitivity_mwa.yml --array-file blmin0_blmax100_0.155GHz_observation.h5
[14:03:33] WARNING  The maximum k value is being restricted by the theoretical signal model. Losing ~23 bins.                                                                                     sensitivity.py:212
           INFO     Used 76 bins between 0.05390475507505667 littleh / Mpc and 4.096761385704307 littleh / Mpc                                                                                            cli.py:139
           INFO     Getting Thermal Variance                                                                                                                                                      sensitivity.py:367
calculating 2D sensitivity: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 978/978 [00:01<00:00, 555.91uv-bins/s]
averaging to 1D: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 249/249 [00:00<00:00, 5112.95kperp-bins/s]
[14:03:35] INFO     Getting Sample Variance                                                                                                                                                       sensitivity.py:370
averaging to 1D: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 249/249 [00:00<00:00, 5092.09kperp-bins/s]
           INFO     Getting Combined Variance                                                                                                                                                     sensitivity.py:364
averaging to 1D: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 249/249 [00:00<00:00, 5136.87kperp-bins/s]
           INFO     Writing sensitivies to 'moderate_150.000 MHz.h5'                                                                                                                              sensitivity.py:533
           INFO     Significance of detection: 0.12369660966702764

This command also outputs a file moderate_155.000 MHz.h5, which contains the standard deviation of the dimensionless power spectrum. The output file also includes the 1D k values corresponding to the sensitivity arrays. By default, a simple plot is made of the 1D PS uncertainty, and is written to the file moderate_155.000 MHz.png. A prefix can be prepended to these filenames by using the --prefix option, and the plotting can be turned off by setting --no-plot.