ARTÍCULO302.- PIGNORACIÓN DE LA BUENAFE, EL CRÉDITOY EL PODER DE

2.4.1 Conclusions

SERPent automates the reduction, flagging and preparation procedures of post-correlated radio interferometric datasets, specifically those from e-MERLIN. SERPent is in the process of being tested on EVN and Global VLBI datasets, showing good early results. SER- Pent is written in Parseltongue, a common scripting language utilised prominently with the EVN, so that the user could start flagging data which has been loaded within the

AIPSenvironment with relative ease. SERPent can be easily added to existing and future pipelines.

The entire SERPent program consists of only two text files. The first is the main SERPent code to be executed, and the second is a user input file designed so the user does not have to interact with the main body of code. The input file also has the benefit of making the input parameters obvious and therefore intuitive to set. This gives the freedom to the user to pursue their own flagging philosophy, i.e. whether they want to be aggressive or conservative with the flagging, but also includes a set of default inputs which will perform well on most datasets.

SERPent is designed to be run on ‘high-end desktop’ computer systems. The examples in this thesis used a system with 16 CPUs and 100 GB of Memory (Leviathan) and was flagging at ∼ 110 GB day−1. This throughput will increase with full e-MERLIN Legacy data as the number of ‘jobs’ will increase with full bandwidth, providing a higher throughput with a higher number of CPUs. It is unlikely that one will be able to process full e-MERLIN legacy data on a modest desktop computer. Although obvious advantages in increased computer facilities and real world limitations on smaller systems are apparent, SERPent can be used by institutions without access to super computer clusters.

Section 2.3 has demonstrated that SERPent can reduce and flag current e-MERLIN commissioning data, which will have many more complications than a stable fully commissioned e-MERLIN including the Legacy datasets. The benefit of using real data instead of simulated data is obvious, and SERPent is now part of the offical pipeline for e-MERLIN, used at Jodrell Bank and other international groups.

In the wider context of this thesis and indeed the astrophysics community, more general conclusions can be gleaned from the experience of creating and developing SERPent. When testing and verifying the performance of any piece of software, real data is imperative. No

matter how many times some software is run on simulated data or even real observations, a new real dataset will almost certainly be different. This may lead to the discovery of bugs or a change in performance. This statement is magnified in the case of e-MERLIN where much of the upgrade process was conducted without any modelling of antenna responses etc. during the commissioning phase.

During the first release of SERPent to the international community, the e-MERLIN correlator’s behaviour changed and the introduction of previously absent NaN’s or empty visibilities from the correlator affected the performance on all of SERPent’s passages. The only way to minimise the possibility of this is through the testing on as many real datasets as possible, to make the software robust and reduce the chance of failure on any given future run. In other words, you can never anticipate everything a real dataset (or a small selection of datasets) will throw at you.

Another conclusion from developing software like SERPent is to get people to use any piece of new software it has to have one or more of the following charateristics: it must be simple to use and run, contain a method which is more advanced than current methods/ functions, be robustly tested to minimise failures, be modular or integrated with current packages and formats so users don’t have to learn or modify existing code.

SERPent ticks most, if not all of these points, with the only limitation being the testing issue above, as SERPent is stable and working well, but is still a relatively new piece of software. Otherwise, being an algorithm written in Python/ Parseltongue with only two files, it is free of compilers and easy to use and interact with. It also benefits from Python’s modulised nature and is part of COBRaS (see Chapter 3), and e-MERLIN’s calibration pipeline. Finally, SERPent uses the most advanced existing method of RFI mitigation via thresholding (studies by Offringa et al. 2010a), along with passages dealing with e-MERLIN specific issues.

2.4.2 Discussion

When constructing an automated flagging script, the flagging philosophy has to be con- sidered and decided. Whilst flagging all of the RFI and flagging none of the data is the idealistic scenario, even with implementing the SumThreshold Method with an extremely low false-positive detection percentage, either some RFI will remain or some good data will be flagged. This is the reality of working with real datasets from imperfect instruments and environments.

Following discussions at the e-MERLIN early science meeting (Manchester, 11th - 12th April 2014), some of the strong narrowband RFI in the L-band observations contaminate neighbouring channels via ringing. This has been seen to affect the entire IF at certain times with RFI. Suggestions of applying Hanning smoothing to the channels before running SERPent may reduce the impact of ringing and increase the performance of SERPent and calibration (Simon Garrington; private communication).

There is a philosophy which states ‘no data is better than bad data’ (a comment made on data editing at the 13thSynthesis Imaging Workshop at Socorro, May 2012), promoting aggressive flagging, while others who would rather flag 80-90% of RFI and have some of the weaker, lesser RFI remain (Rob Beswick; commenting on SERPent at the RadioNet Advanced Radio Astronomy workshop in Manchester, November 2012). Obviously both strategies can not be accommodated in total automation, therefore SERPent has the option for the user to decide some of the flagging parameters. These parameters include the aggressiveness (β), subset sizes (N) and kickout thresholds. The AIPS REFLG task has also been seen to over-flag at times, although it is necessary to condense the number of rows in the AIPSFG table. For further discussion on REFLG, see Section 2.5.

The computational performance of SERPent is probably the area which requires most improvement and future plans are in place to improve this (see Section 2.5). It currently flags ∼ 110 GB/day with 16 CPUs, which is reasonable for commissioning e-MERLIN datasets. However, for e-MERLIN Legacy data this will be slow. It is obvious that including more CPUs could solve this problem, as 16 CPUs is still very modest in modern computing terms, however this is merely shifting the problem onto hardware (and isn’t very constructive). The flagging sequence makes two full passes through the SumThreshold method (the original AOflagger; Offringa et al. 2010b makes 5 passes) in order to maximise RFI detections, and skips these passes if the threshold level is low enough. This is currently the limiting factor in terms of performance. Reducing this to one full pass would speed SERPent up considerably at the expense of RFI mitigation performance. Note that the amount of RFI also affects computational performance, because more RFI means more full runs completed within SERPent, and less RFI means more cycles are skipped due to the invoked kickout clauses implemented in the flagging sequence to stop over flagging and increase speed performance.

Comparing SERPent with flagging implementations on the JVLA and LOFAR, the data volume per processing time appears to be slower. In the case with LOFAR, the

AOflagger has been written in a high-level language (C++) and includes specific compiler settings to achieve the optimal performance (Offringa 2012). In addition, the AOflagger is heavily parallelised over multiple cores and nodes on a super cluster, vectorised, and is part of the LOFAR pipeline which fully reduces and calibrates observations for users. This is different in the case of e-MERLIN, where the data will still be in a raw format when presented to the user, who will not have access to the same computing facilities as LOFAR. There is work currently being conducted on a general e-MERLIN pipeline, and SERPent is the flagging software implemented for the reduction passage. However, this is only a general pipeline and does not account for the many calibration techniques and methods needed for the many diverse projects e-MERLIN will observe for.

In the case of the JVLA, there is no implementation that is as sophiscated in mitigating RFI as the AOflagger or SERPent methods. The CASA software package is the main choice for the JVLA, and all developments are focused to this package. On the contrary, e-MERLIN currently favours AIPS because the ability of fringe fitting (the calculation of delay, rates and phase offset solutions of each antenna because residual errors remain after cross correlation from the correlator) exists within the program and is required to calibrate e-MERLIN data.

According to feedback received from users, SERPent can be rather aggressive at times. Whilst differing flagging philosophies can account for these views, it should also be con- sidered that e-MERLIN is not a completely settled system, with noticeable improvements in data quality output from month to month. For example, there have been filter issues with some of the COBRaS April 2012 L-band datasets which have since been resolved (May 2013 L-band dataset), but caused amplitude level issues which then affected RFI mitigation performance. e-MERLIN is a heterogeneous array whose antennas have other responsibilities outside of e-MERLIN (Lovell and Cambridge partake in EVN observations). Compared to other, dedicated arrays such as the VLA/ JVLA and ALMA, both homogeneous (ALMA has 2 types of antennas) arrays which have been modelled exten- sively before commissioning. This provides a much smoother transition from the commissioning to fully-commissioned phases for the JVLA and ALMA. These factors should not be over looked with respect to e-MERLIN commissioning and early Legacy datasets, because both hardware and software changes make maintaining external software such as SERPent difficult.

of RFI. The kickout clause was added to ensure that good data was protected from the flagger as more aggressive flagging was required. Inadvertently this variable can set the overall aggressiveness by allowing the longer flagging runs to flag close to the median of each sample. The aggressiveness parameter β has stayed constant since the inclusion of the kickout clause. β may need to be automatically set depending on spread statistics (i.e. how many different levels of RFI there are) rather than empirically set, to achieve optimal flagging.

If β is large enough lowering the aggressiveness, then the kickout clause should not be envoked, but in the regime of aggressive flagging with a low β value it can have an effect on how much is flagged. In essence, β sets the first threshold level, ρ determines the difference in the subsequent threshold levels χ(N ) (i.e. a function of the number of subsets N ) and the kickout clause determines when to stop the algorithm if χ(N ) gets too close to the median.

Furthermore the tweaking of SERPent flagging parameters may still yet yield the most optimised settings for both flagging and speed performances. The best time to conduct and hone these settings will be once e-MERLIN has settled and finished its commissioning phase (see Future Work).

In document Presentado por los representantes Hernández Montañez, ernándriitos García, «Santa Rodríguez, Soto Arroyo Torres Crp (página 28-32)