Precise photometry and photo-z s with multi narrow-band data and deep learning

(1)

ADVERTIMENT. Lʼaccés als continguts dʼaquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials dʼinvestigació i docència en els termes establerts a lʼart. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix lʼautorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No sʼautoritza la seva reproducció o altres formes dʼexplotació efectuades amb finalitats de lucre ni la seva comunicació pública des dʼun lloc aliè al servei TDX. Tampoc sʼautoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

ADVERTENCIA. El acceso a los contenidos de esta tesis doctoral y su utilización debe respetar los derechos de la persona autora. Puede ser utilizada para consulta o estudio personal, así como en actividades o materiales de investigación y docencia en los términos establecidos en el art. 32 del Texto Refundido de la Ley de Propiedad Intelectual (RDL 1/1996). Para otros usos se requiere la autorización previa y expresa de la persona autora. En cualquier caso, en la utilización de sus contenidos se deberá indicar de forma clara el nombre y apellidos de la persona autora y el título de la tesis doctoral. No se autoriza su reproducción u otras formas de explotación efectuadas con fines lucrativos ni su comunicación pública desde un sitio ajeno al servicio TDR. Tampoco se autoriza la presentación de su contenido en una ventana o marco ajeno a TDR (framing). Esta reserva de derechos afecta tanto al contenido de la tesis como a sus resúmenes e índices.

WARNING. The access to the contents of this doctoral thesis and its use must respect the rights of the author. It can be used for reference or private study, as well as research and learning activities or materials in the terms established by the 32nd article of the Spanish Consolidated Copyright Act (RDL 1/1996). Express and previous authorization of the author is required for any other uses. In any case, when using its content, full name of the author and title of the thesis must be clearly indicated. Reproduction or other forms of for profit use or public communication from outside TDX service is not allowed. Presentation of its content in a window or frame external to TDX (framing) is not authorized either.

These rights affect both the content of the thesis and its abstracts and indexes.

Precise photometry and photo-z s with multi narrow-band data and deep learning

Laura Cabayol García

(2)

Precise photometry and photo-z s with multi narrow-band data and deep learning

Laura Cabayol Garc´ıa

Programa de Doctorat en F´ısica Universitat Aut` onoma de Barcelona Director: Dr. Martin Børstad Eriksen

Tutor: Dr. Manuel Delfino Reznicek

Intitut de F´ısica de Altes Energies

A thesis submitted for the degree of Philosophae Doctor (PhD)

June 2022

(3)

Abstract

In the last decades, galaxy surveys have triggered unprecedented progress in our understanding of the Universe. Better astronomical cameras or more powerful computers have enabled the collection of more and better data. Astronomical images need to be processed to turn into photometric catalogues and ultimately into photometric redshifts. Current galaxy surveys have observed the order of millions of galaxies while upcoming surveys like Euclid or LSST will increase these numbers to billions. These data will require fast and precise methods to extract the photometry and the photometric redshift.

In this thesis, we have used data from the Physics of the Accelerating Universe Survey (PAUS) to develop an end-to-end deep-learning algorithm to extract the photometry and predict the photometric redshift from astronomical images. We have built the pipeline in three steps, gradually increasing the complexity of the data-reduction operation. In this step-wise approach, we have optimised each photometry process independently, learning about the data, the network requirements, and its underlying mechanisms.

The first project predicts the background noise in the presence of nuisance artefacts and strongly-varying backgrounds. On average, our deep-learning background measurements improve the photometry by 7% and up to 20% at the bright end. The background measurements also reduce the photometric redshift outlier rate by 35% for the best 20% galaxies.

The second project measures the probability distribution of the photometry in single- exposure images. On average, the deep-learning photometry increases the signal-to-noise of the flux measurements by a factor of two compared to an existing aperture photometry algorithm. This algorithm also incorporates other advantages such as robustness towards distorting artefacts, e.g. cosmic rays or scattered light, the ability of deblending, and less sensitivity to uncertainties in the galaxy profile parameters used to infer the photometry.

This enables reducing the number of photometry outlier observations from 10% to 2%, compared to aperture photometry.

The thesis also presents a novel methodology to enable better broad-band photometric redshifts using data only available for a fraction of the observations. The method consists of a multi-task neural network that predicts the photometric redshift and the PAUS narrow- band photometry. The photometry estimation is an auxiliary quantity that correlates with the redshift. This forces the network to learn a general solution capable of predicting the photometry and the redshift simultaneously. As the auxiliary data are not used as input to the network, we can evaluate the redshift of any galaxy without such data available. In the COSMOS field, we find that the method predicts photometric redshifts that are 14% more precise down to magnitude i_AB < 23 while reducing the outlier rate by 40% with respect to the broad-band photometric redshifts. Furthermore, for simulated data, training on a sample with i_AB < 23 the method reduces the photo-z scatter by 15% for all galaxies with

(4)

24 < i_AB< 25.

Finally, the last step expands the single-band photometry measurements to multi-band photometry. Using information from the full galaxy spectral energy distribution, this network predicts the photometry in each of the bands and the photometric redshift. This method duplicates the signal-to-noise ratio of the galaxy photometry with respect to the Lumos photometry. Furthermore, colour histograms indicate that multi-band photometry contains less noise that the Lumos and the MEMBA ones since the colour-histograms width is reduced by 5 and 3, respectively. The photometric redshifts are trained on simulations and adapted to the data using transfer learning. These photo-zs improves BCNz2 template-based photo-z measurements, particularly at the faint end with 25% more precise photo-z. However, we have still not reached the Deepz precision. This project is still work in progress and in the near future we aim to study and improve the photo-z precision at the bright end.

(5)

Pr`oleg

En les darreres dècades, millores tecnològiques com la potència de càlcul dels ordinadors i dels fotodetectors, han provocat un progrés sense precedents en el coneixement de l’Univers.

Les exploracions sistemàtiques de l’Univers ens han permès obtenir catàlegs fotomètrics de galàxies, que són necessaris per poder calcular la distància a la qual es troben les galàxies (redfshift ) i poder aix´ı fer mapes de l’Univers. En l’actualitat, s’han observat de l’ordre de milions de galàxies, però en properes exploracions, com per exemple les que faran Euclid o LSST, n’observarem de l’ordre de bilions. Totes aquestes dades requeriran mètodes ràpids i acurats per a calcular la fotometria i el redfshift de les galàxies.

Aquesta tesi se centra en el desenvolupament d’un algoritme d’aprenentatge profund per mesurar simultàniament la fotometria i el redfshift d’una galàxia. L’algoritme s’implementa directament sobre imatges astronòmiques i va d’extrem a extrem, incrementant gradualment la complexitat del procés d’extracció de dades. D’aquesta manera, hem optimitzat cada pas de la reducció d’imatges de manera independent, fet que ens ha permès aprendre els requer- iments i mecanismes de les xarxes neuronals emprades. Per desenvolupar el mètode, hem utilitzat dades de l’experiment Physics of the Accelerating Universe Survey (PAUS).

La primera part de la tesi s’enfoca en la predicció del soroll de fons de les imatges utilitzant xarxes neuronal convolucionals. De mitjana, l’algoritme millora la fotometria de les galàxies entre un 7 i un 20%. A més a més, les nostres mesures de soroll redueixen un 35%

els photo-z at´ıpics presents en la mostra.

La segona part de la tesi, extenem el treball previ i desenvolupem una xarxa neuronal que mesura la distribució de probabilitat de la fotometria en cada banda fotomètrica de manera independent. De mitjana, la nostra fotometria duplica el senyal-soroll de les mesures de flux realitzades amb un codi existent de fotometria d’obertura. El nostre algoritme d’aprenentatge profund també incorpora altres beneficis com robustesa en presència d’elements distorsionats, per exemple raigs còsmics, i menys sensitivitat a inexactituds en els paràmetres que defineixen les galàxies. Això permet reduir el nombre de galàxies amb fotometria at´ıpica d’un 10% a un 2%, en comparació amb la fotometria d’obertura.

La tesi també explora com millorar les mesures del redfshift de les galàxies fotografiades amb filtres fotomètrics de banda ampla (baixa resolució de longitud d’ona) utilitzant observa- cions en bandes estretes. El mètode consisteix en una xarxa neuronal multitasca que prediu el redfshift i la fotometria en banda estreta d’una galàxia a partir de la seva fotometria en banda ampla. La fotometria està correlacionada amb el redfshift, aix´ı la xarxa neuronal pot emprar el coneixement adquirit en la predicció d’una de les quantitats per millorar l’altra. La fotometria en banda estreta no són dades d’entrada a la xarxa neuronal. D’aquesta manera, un cop entrenada la xarxa pot predir el redfshift de qualsevol galàxia a partir de la seva fotometria en banda ampla sense requerir fotometria en banda estreta. Al camp ”COSMOS´´, el nostre mètode prediu photo-z amb un 14% més de precisió fins a magnituds i_AB < 23 i

(6)

redueix el nombre de photo-z at´ıpics un 40%. A més a més, hem pogut provar en simulacions que la xarxa neuronal multitasca també redueix un 15% la dispersió en el photo-z de galàxies amb magnitud 24 < i_AB < 25.

L’últim cap´ıtol mesura fotometria multibanda i el photo-z de les galàxies a partir de les imatges. Aquesta xarxa utilitza la informació disponible en totes les imatges obtingudes d’una galàxia per fer prediccions en cadascuna de les bandes. La fotometria multibanda duplica el senyal-soroll de la fotmetria banda per banda. Les prediccions del redshift són encara treball en procès i tenen encara marge de millora. Estem treballant en entendre una tendència sistemàtica en el photo-z de galàxies brillants.

(7)

I Concepts 4

1 Deep learning background 5

1.1 Gentle introduction to machine learning . . . 5

1.2 Implementation of deep learning algorithms . . . 7

1.2.1 Convolutional Neural Networks . . . 9

1.2.2 Mixture density networks . . . 12

1.2.3 Multi-task learning . . . 13

1.3 Deep learning in astronomy . . . 13

1.3.1 Object classification . . . 15

1.3.2 Photometric redshift estimation . . . 16

1.3.3 Other applications . . . 16

2 The PAU Survey 17 2.1 Galaxy surveys . . . 17

2.1.1 Imaging surveys . . . 18

2.2 The PAU Survey . . . 20

2.2.1 History. . . 20

2.2.2 PAUCam characteristics and construction . . . 22

2.2.3 PAUCam commissioning . . . 24

2.2.4 Science goals . . . 24

3 Image processing, photometry and photometric redshifts 27 3.1 From raw to science images . . . 27

3.1.1 Bias subtraction. . . 27

3.1.2 Flat fielding . . . 27

3.1.3 Dark current . . . 28

3.1.4 Cosmic rays and other spurious artefacts . . . 28

3.2 Photometry . . . 29

3.2.1 Photometry calibration . . . 31

3.3 Further-processing: co-adding flux measurements and derived properties. . . . 32

3.3.1 Co-addition . . . 32

(8)

3.3.2 Object classification . . . 33

3.3.3 Photometric redshift estimation . . . 33

3.4 PAUS data management . . . 35

3.4.1 From raw to science images: the Nightly pipeline . . . 35

3.4.2 Measuring photometry: the MEMBA pipeline . . . 37

3.4.3 PAUS photometric redshifts . . . 39

3.4.4 PAUS data in COSMOS . . . 41

II Galaxy photometry and photo-zs with deep learning 43

4 Background light prediction 44 4.1 Motivation. . . 44

4.2 Modelling scattered-light . . . 45

4.2.1 The PAUS observations . . . 46

4.2.2 Scattered-light templates . . . 46

4.2.3 Scattered-light templates as scattered-light correcting method . . . 49

4.3 BKGnet: A Deep Learning based method to predict the background . . . 49

4.3.1 Neural network architecture . . . 49

4.3.2 Data: training and test samples . . . 51

4.3.3 Training process and loss function . . . 52

4.4 Testing BKGnet on simulations . . . 53

4.4.1 Simulated PAUCam background images . . . 53

4.4.2 BKGnet predictions on simulations . . . 54

4.5 BKGnet on PAUCam images . . . 56

4.6 BKGnet validation . . . 58

4.6.1 Generating the PAUS catalogue with BKGnet predictions . . . 59

4.6.2 Validating the catalogues. . . 60

4.7 Conclusions . . . 63

4.A Variable annulus . . . 65

5 Single band photometry 67 5.1 Motivation. . . 67

5.2 Data . . . 68

5.2.1 PAUS data . . . 69

5.2.2 Teahupoo simulations . . . 69

5.2.3 Comparison between PAUCam and Teahupoo galaxies . . . 71

5.3 Flux estimation methods . . . 73

5.3.1 Profile fitting . . . 73

5.3.2 Aperture photometry . . . 73

5.3.3 Weighted pixel sum . . . 73

5.4 Lumos: Measuring fluxes with a CNN . . . 74

5.4.1 Input data . . . 74

5.4.2 Lumos architecture . . . 75

5.4.3 Unsupervised transfer learning . . . 77

(9)

5.4.4 Training procedure . . . 78

5.5 Lumos flux measurements on simulations . . . 79

5.5.1 Flux probability distributions . . . 79

5.5.2 Comparison with different flux estimation methods . . . 82

5.5.3 Deblending with Lumos . . . 82

5.6 Lumos photometry on PAUS data . . . 84

5.6.1 Single exposure measurements . . . 84

5.6.2 Comparison with SDSS spectroscopy . . . 91

5.6.3 Coadded flux measurements . . . 92

5.6.4 Photometric redshift estimates. . . 93

5.7 Conclusions and discussion . . . 95

5.A Flux estimation methods: derivations . . . 98

5.B Forecasting the effect of errors on profile parameters . . . 98

5.C Photometric redshifts with BCNz2 on PAUS galaxy mocks . . . 100

5.D Colour histograms in the complete narrow band set . . . 102

5.E Photometry and photo-z correlations with galaxy parameters . . . 103

5.E.1 Lumos photometry correlation with galaxy parameters . . . 103

5.E.2 Photo-z correlation with galaxy parameters . . . 106

6 Improving broadband photometric redshifts with multi-task learning 107 6.1 Motivation. . . 107

6.2 Data . . . 109

6.2.1 PAUS data . . . 109

6.2.2 Photometric redshift sample . . . 109

6.2.3 Broadband data . . . 110

6.2.4 Spectroscopic galaxy sample . . . 110

6.2.5 Galaxy mocks . . . 110

6.3 Multi-task neural network to improve broad-band photo-z. . . 110

6.3.1 Multi-task learning . . . 111

6.3.2 Model architecture and training procedures. . . 111

6.4 Photo-z performance in the COSMOS field . . . 114

6.4.1 Photo-z performance metrics . . . 114

6.4.2 Photo-z dispersion . . . 115

6.4.3 Photo-z bias and outlier rate . . . 117

6.4.4 Redshift distributions, N(z) . . . 118

6.5 Photo-z performance on deeper galaxy simulations . . . 120

6.6 Photo-z in colour-space . . . 121

6.6.1 MTL photo-z in colour-space. . . 122

6.6.2 Broad-band degeneracies in colour-space . . . 124

6.6.3 Emission line confusions . . . 126

6.7 Understanding the MTL underlying mechanism . . . 128

6.7.1 Underlying data representation in colour-space with MTL . . . 128

6.7.2 MTL with other galaxy parameters . . . 129

6.7.3 Effect of narrow-band resolution . . . 131

6.8 Discussion and conclusions . . . 131

(10)

6.A Self-organising maps . . . 133

6.B Effect of training with photo-z as ground-truth targets . . . 134

6.B.1 Effect of photo-z dispersion in the training redshifts . . . 135

6.B.2 Effect of photo-z outliers in the training redshifts . . . 136

7 Multi-band photometry and photometric redshifts 139 7.1 Motivation. . . 139

7.2 Data . . . 141

7.2.1 PAUS data . . . 141

7.2.2 Spectroscopic sample . . . 141

7.2.3 PAUS image simulations . . . 142

7.3 Network architecture and training procedure . . . 143

7.3.1 Feature extraction with a convolutional neural network . . . 143

7.3.2 Photometry and photo-z s with MDN . . . 145

7.3.3 Training procedure . . . 146

7.4 Multi-band photometry measurements . . . 148

7.4.1 Aczio photometry on simulated data . . . 149

7.4.2 Aczio fluxes on PAUCam images . . . 151

7.4.3 Flux uncertainties. . . 153

7.4.4 Signal-to-noise on PAUCam data . . . 154

7.4.5 Colour histograms . . . 154

7.4.6 Robustness of the method . . . 156

7.5 Photo-z measurements . . . 157

7.5.1 Photo-z s in COSMOS . . . 157

7.5.2 Photo-z probability distributions . . . 161

7.5.3 Colour space . . . 163

7.6 Conclusions . . . 163

7.A Effect uninformative features in the input of a neural network . . . 165

7.B Simulated broad-band photometry. . . 167

7.C Scattering the ground-truth redshifts . . . 168

8 Summary and conclusions 169

(11)

Introduction

Astronomers have been observing the Universe for centuries. Back in Ancient Greece, very well-known names as Anaxagoras or Ptolemy studied astronomical phenomena such as eclipses, the brightness of celestial objects or their rotational movement by observing the nearby sky. In 1929, galaxy surveys became a standard tool in astronomy, which entailed a change of model in astronomical studies from the analysis of single observations to a statis- tical one (Okamura, 2020).

Galaxy redshift surveys are a powerful tool to study the Universe. These map a region of the sky and locate the position and redshift of the objects inside. In the last decades, there has been a breakthrough in the amount and quality of galaxy survey’s data, leading to unprecedented progress in our understanding of the Universe. As an example, the Palo- mar Observatory Sky Survey (POSS-I, Minkowski & Abell, 1963) which imaged 2/3 of the observable sky from Palomar Mountain to i_AB< 21 in photographic plates back in 1950. In contrast, current modern surveys are observing hundred a million galaxies (The Dark Energy Survey Collaboration, 2005; de Jong et al., 2013) to fainter magnitudes i_AB ∼ 24.

Furthermore, in the next decades, the number of observed galaxies, the sky coverage, and the observation’s depth will be extended by the next generation of ground and space telescopes. Euclid (Laureijs et al., 2011) will observe 15 000 deg², yielding photometry and photometric redshifts for about 10 billion sources. Also, LSST (Ivezi´c et al., 2019a) will image 20 billion galaxies to i_AB< 24. These data will require fast and precise methods to turn astronomical images into photometry and ultimately into photometric redshift catalogues.

Galaxy surveys can be broadly classified as spectroscopic or photometric surveys. The former splits the light in narrow wavelength bins, enabling to determine very precise redshifts.

However, it is time-consuming and the efficiency of obtaining redshifts is low. In contrast, photometric surveys image the sky using a few pass-band photometric filters at different wavelengths. This enables observing many objects simultaneously but at expense of a lower wavelength resolution, leading to less precise redshift measurements. While spectroscopic data are powerful for galaxy evolution studies, e.g. star formation and mergers (Robotham et al., 2014) and the environmental dependence of galaxy evolution (Alpaslan et al., 2015), large photometric data-sets are very suitable for large scale structure and gravitational lensing analysis (Kuijken et al.,2015; Elvin-Poole et al., 2018a).

Obtaining precise photometric redshift is crucial for most cosmological studies. This has

(12)

prompted important efforts to improve the redshift estimation methods, leading to a wealth of techniques optimised for different science applications and types of data (e.g. Feldmann et al. 2006; Brammer et al. 2008; Eriksen et al. 2019). These techniques typically use the photometry measured from the astronomical images. Therefore, the data reduction process converting astronomical images into photometry catalogues is a key step in the determination of accurate photometric redshifts.

This thesis uses data from the Physics of the Accelerating Universe survey (PAUS, Mart´ı et al., 2014), which is a unique imaging redshift survey taking data with a camera equipped with 40 narrow-band filters (Padilla et al.,2019a). Such a large number of photometric filters provides PAUS with a wavelength resolution in between broad-band photometry and spectroscopy, which enables reducing the photometric redshift uncertainty by a factor of around 15 with respect to typical broad-band imaging surveys (Eriksen et al., 2019; Eriksen et al., 2020; Soo et al., 2021).

Deep learning techniques have been undergoing an unprecedented revolution over the last few years. This has been prompted by an increasing amount of available data and computing power, together with a better theoretical understanding of the techniques. The development of Graphical Processing Units (GPUs) has been crucial for speeding up the computation of modern deep-learning algorithms, enabling the growth of a new deep-learning field devoted to the development of techniques applied to images. These techniques have also reached astronomy, where implementing deep learning tools to the increasing amount of astronomical images is a new promising venue (e.g. Pasquet et al., 2019; Zhang & Bloom, 2019; Arcelin et al.,2021).

In this thesis, we have developed an end-to-end deep learning pipeline to go from PAUS science images to photometry and the photometric redshift. Chapters 1, 2, and 3 are an introduction with useful information for understanding the thesis. The former (§1) presents the basic concepts for the understanding of neural networks and introduces the architectures and training methodologies used across the thesis. The second introductory chapter (§2) presents the PAU Survey, its camera and science goals. Finally, the last part of the introduction (§3) explains the general astronomical data reduction steps to reduce raw astronomical images to photometry and photometric redshift catalogues. The same chapter also introduces the PAUS data management pipeline. This consists of a de-trending code that converts raw images into reduced science images (§3.4.1) and a second part that estimates the photometry from such reduced images (§3.4.2).

Our deep-learning pipeline has been developed in multiple independent steps, each of them addressing a data reduction operation. Studying each step independently gives a better understanding of the data and the network’s requirements, e.g. which information is relevant to improve the photometry and how to provide the network with such information. First. in Chapter 4, we have predicted the sky-background noise using CNNs, which is our first step in obtaining reliable photometry and photo-z. We introduce BKGnet, an algorithm to make accurate background noise predictions at the source location in the presence of nuisance artefacts and strongly varying background light. This is a published work inCabayol-Garcia et al.

(13)

(2020), titled “The PAU Survey: Background light estimation with deep learning techniques”.

Chapter 5 introduces Lumos, a CNN that predicts the probability distribution of the already background-light subtracted photometry. Lumos uses the experience from BKGnet to tackle a more complex data reduction operation. This work is published as “The PAU survey:

Estimating galaxy photometry with deep learning” (Cabayol et al., 2021).

Some of the techniques used to extend Lumos to measure multi-band photometry were first tested in Chapter 6, which introduces a multi-task learning network to enable better broad- band photometric redshifts using PAUS data. This network predicts the photo-z and the PAUS narrow-band photometry simultaneously from the broad-band photometry, combining both tasks in the loss function. The method only uses PAUS data during the training phase to evaluate the accuracy of the network narrow-band photometry predictions. Therefore, we can estimate the photo-z of any galaxy with broad-band photometry, without requiring narrow-band observations. This work is currently undergoing Euclid internal review for pub- lication in “The PAU Survey & Euclid: Improving broad-band photometric redshifts with multi-task learning” (Cabayol et al. in prep.).

Chapter 7 presents the last part of the photometric pipeline. This extends Lumos to predict the multi-band photometry and photo-z of any galaxy from its image observations. First, the network extracts a set of co-added features from all observations of a galaxy in a narrow band. Then, the features from all bands are used to predict the flux in each narrow band filter, in such a way that the network uses all the spectral energy distribution information encoded in the galaxy images to predict the photometry in a single band. This network uses the knowledge of Chapter6 to implement a multi-task learning training that simultaneously predict the photometry photo-z. Both tasks share a set of network layers that capture data traits relevant for the two predictions. The photo-z prediction also inputs the co-added features from all narrow-bands as input, in such a way that it uses the information available in all the images of a galaxy. This last chapter builds on all the previous work presented, expanding the photometry pipeline to go end-to-end and using the multi-task learning techniques tested in Chapter 6. This work is in preparation as ”The PAU Survey: Multi-band photometry and photo-z from narrow-band images with deep learning“ (Cabayol et al. in prep.).

(14)

Part I

Concepts

(15)

Chapter 1 Deep learning background

1.1 Gentle introduction to machine learning

Artificial intelligence (AI) is a field focused on constructing complex machines that can process intellectual tasks normally performed by humans. The term ”artificial intelligence“ was coined back in 1956 at a conference at Dartmouth College (McCarthy et al., 2006). By that time, AI generated great expectations and money was invested in the field.

The first very simple Artificial Neural Network (ANN) was created in 1958 by Frank Rosenblatt and was named ’Perceptron’. Currently, ANNs concatenate several layers that are constructed by putting together collections of perceptrons. Each of these layers is typically responsible for learning a specific hidden pattern of the data.

The development of the perceptron created enthusiasm amongst the academic community, however, the computer power was limiting its extension to deeper (with more layers) ANNs and a single perceptron could only handle trivial versions of the problems they were supposed to solve. This led to disappointment and the interest and investment in the field dropped off.

As a consequence, in 1973, the UK Parliament severely criticised the progress on AI, which triggered a cut in AI investment and the coming years (1974 to 1980) the research on AI was marginal (”AI winter“).

In the eighties, the interest in AI returned. In 1980, Yann LeCun developed an early version of a convolutional neural network (CNN, LeCun et al., 1989) that could recognise handwritten digits. This was successfully implemented in postal and banking services. Nev- ertheless and despite the current popularity of CNNs, by that time computer power was also limiting CNN’s performance and this type of network architecture was left aside.

The strong limitations of ANN performance triggered that other types of AI had their heyday in the eighties. Expert systems were first introduced by Edward Feigenbaum in 1965.

These rely on two components: a knowledge base that provides a set of rules to carry out a task and an inference engine that implements logical algorithms to the knowledge base to infer new rules. MYCIN (Buchanan & Shortliffe,1984) is a classic expert system implementation developed to diagnose and recommend medical treatment, supporting clinicians in the

(16)

early diagnosis of meningitis. MYCIN’s knowledge relies on approximately 500 antecedent- consequent rules that enabled to recognise of ∼ 100 causes of bacterial infections. To make a decision, MYCIN starts with information such as e.g. the age, sex, and medical history of the patient, scaling to more specific questions when required. Another example of a successful expert system implementation is DeepBlue (Campbell et al., 2002). It is a chess computer that defeated the world chess champion, Kasparov, in 1998. DeepBlue consists of an early version of a supercomputer that estimates approximately 150 million possible chess move- ments per second and a decision tree calculating the best move. There were several successful expert system implementations however, managing the knowledge base and writing accurate expert system rules was difficult.

In the nineties, the hype in expert systems declined, which has two possible interpre- tations. The first one is that although expert systems provide deep, focused knowledge of a particular problem, this knowledge cannot be generalised to any other task,e.g. MYCIN cannot be implemented or easily adapted to the diagnosis of encephalitis. Therefore, these algorithms could not expand to a more general AI technology fast enough to keep the hype and AI moved on. Another possible interpretation is that expert systems were absorbed by other tools that used their technology as part of other offerings, leaving the standalone expert system out of the spotlight. Nevertheless, nowadays there is still some research on standalone expert systems, e.g. Ahmed & Mahmoud (2020).

In 2008 Fei-Fei Li set up ImageNet (Deng et al., 2009), which is a database of annotated images that provides a common image data-set to train and benchmark models. ImageNet quickly scaled to 11 million images in 2010 and currently contains more than 14 million examples. The setting of ImageNet was an AI’s milestone that has eased the research in computer vision tasks. In 2012, AlexNet (Krizhevsky et al., 2012) beat any previous result in image recognition tasks using the ImageNet database. AlexNet is a CNN with eight layers; five convolutional layers followed by three fully-connected layers. The main result in Krizhevsky et al. (2012) was that the depth of the model (number of layers) was triggering the great network’s performance. This is the origin of deep learning (DL), where the adjective ”deep“

refers to ANNs with a large number of layers. AlexNet was designed by Alex Krizhevsky under the supervision of Geoffrey Hinton, one of the pioneers of deep learning.

Nowadays, AI has become part of our daily existence. The explosion of smartphones and similar devices collecting huge amounts of data together with the development of applications using AI, e.g. voice and search assistants, triggered important companies like Google and Facebook to invest a lot of money in AI research. This revolution was boosted by several factors:

Big data: Over the last years, the amount of available data has quickly increased thanks to devices like smartphones and computers. These gadgets daily generate huge amounts of data collected by services like e.g. Google, Facebook, YouTube, and Instagram.

While traditional shallow ANNs do not benefit from having more data, deep ANNs can boost their performance by using very large data sets.

Processing Power: Graphical Processing Units (GPUs) have emerged as technologies

(17)

to speed up computation, especially for deep learning algorithms requiring the computation of multiple parallel processes. GPUs were initially developed for accelerating graphics processing, however, these have become a crucial part of modern deep learning networks. GPUs are central to the increase in computer power (Oh & Jung,2004).

These were already used for the development of Alexnet.

Complex models trained with large data sets demand more computational power. While high-end GPUs can be very expensive, cloud services offer a cheaper alternative to increase computational power that many more people can access.

Open-source software: Recently, several neural networks have been developed an provided as open-source software enabling a wider and standardised application of machine learning tools. Important examples are Keras (Chollet et al.,2015), Tensorflow (Abadi et al.,2015), and PyTorch (Paszke et al.,2017), where the last two have been developed by Google and Facebook, respectively.

1.2 Implementation of deep learning algorithms

Machine Learning (ML) is a branch of AI that learns how to solve a specific problem from data. In classical software, routines to perform a specific task are hand-coded with a specific set of instructions to perform such a task. Instead, machine learning algorithms iteratively learn from the data how to perform the task in a process called training.

To create algorithms that learn similarly to humans, ANN architectures are inspired by the structure of the human brain. As mentioned in §1.1, the ANN computational unit is the perceptron, which is the analogue of a neuron, and ANNs are made of combinations of perceptrons named layers. The first layer of the network is the input layer and the last, the output layer. The layers in between are the so-called hidden layers (see Fig.1.1).

Supervised ANNs model a problem by optimising a set of trainable parameters (the perceptrons and also technically named weights) to fit the data. This is done using a training sample, which is a data-set of input examples with a known solution. Given the training sample, the ANN optimises its trainable parameters to minimise the difference between the outcome prediction and the expected output. We can differentiate three stages in the ANN’s training phase: forward propagation, backpropagation and weight optimisation.

The training starts with the forward propagation. At this stage, the input data (xⁱⁿ_i ) propagates through all the network layers and the output layer provides a prediction for each of the input samples. This prediction can be of different types depending on the problem the network is addressing. If the network is a classifier, it predicts the class the input example belongs. In contrast, if the network is addressing a regression problem, the prediction is a value for the regression that can also be attached to other quantities such as the uncertainty or the covariance. In the case of a linear ANN, the forward propagation in one layer reads as

x~⁰ = φ(w · ~x + ~b) , (1.1)

(18)

Forward propagation

Back propagation

Input layer Hidden layer Output layer

Input Output

Figure 1.1: ANN composed of an input, a hidden and an output layer. Each circle represents a perceptron and has an associated weight w_i. The black lines are the connection between the perceptrons in one layer and those from the following one. This particular example has two variables as input at outputs a single prediction.

where ~x⁰ is the signal after doing the forward propagation in the layer, ~x is the input vector to the layer, w is the weight matrix, and ~b is the bias term. After each layer, there is an activation function (φ) which is a non-linear function that maps the output of a layer to the input of the following one (see Fig.1.1). This is required to produce non-linearities in the model. There are several common activation functions, e.g. the Sigmoid function or the hyperbolic tangent. Recently, the ReLU function (Nair & Hinton,2010), which is

φ(~x) = max(0, ~x) , (1.2)

has become the default activation function for many neural networks. The ReLU usually achieves better convergence performance and it is computationally more efficient than previous commonly used functions.

The main limitation of the ReLU function happens when many ReLU neurons only output zero values, which is known as the dying ReLU problem. As the slope in the negative range is zero, the dead neurons remain stuck providing zero values. Some variations of the ReLU function, e.g. the LeakyReLU (Xu et al., 2015) and the SELU (Klambauer et al., 2017), emerged in attempts to further optimisation of the dying ReLU problem.

After the forward propagation, the prediction is compared with the known true value (label ) using a loss function that evaluates how well the algorithm models the data. There are many different loss functions and their choice depends on the task we are optimising.

Typically classification problems use a cross-entropy loss function (Good, 1952), although

(19)

there are also other options like e.g. the Kullback-Leibler divergence (Kullback & Leibler, 1951). On the other hand, regression networks predicting continuous values use e.g. the mean squared error or the mean absolute error. In §1.2.2 we will describe more complicated loss functions that are used to predict the probability distribution of a continuous target quantity, e.g. Eq.1.5, which is used to optimise Gaussian mixture density networks (§1.2.2).

The ultimate goal of a supervised machine learning algorithm is to efficiently minimise the loss function. Backpropagation (Kelley, 1960) is an optimisation method that consists in computing the contribution (gradients) of the ANN’s weights w to the loss function L after each forward pass using the chain rule. After back-propagation, the optimiser uses the estimated gradients to update the parameters in a way that minimises the loss function (weight optimisation). This whole procedure takes place repeatedly and it is commonly implemented with a gradient descent algorithm. This algorithm reduces the loss function after each iteration while adapting the parameters to the data until finding the loss function global minimum. The gradients provide the direction with the steepest ascent in the loss function space. Therefore the optimisation must be done opposite to the gradient (this is why it is called gradient descent), i.e.

~

w ← ~w − α · ~∇L( ~w) (1.3)

where α is the so-called learning rate, which controls the variation of the model parameters.

The gradients are smaller as the network approaches the minimum in the loss function space, where these are exactly zero.

Gradient descent computes the gradients using the full sample, which can be computationally expensive. Stochastic Gradient Descent is a variation of gradient descent that uses randomly shuffled and sampled data of a size smaller than the whole training sample (e.g.

100, 128, or 256 data examples) to estimate the gradients. Each of these groups of data is named batch. Gradients from batches are typically noisier than those estimated from the whole sample, thus the network takes longer to converge. Nevertheless, stochastic gradient descent is still computationally less expensive than typical gradient descent, which makes the former the commonly preferred algorithm to optimise neural networks.

Nowadays, there are many types of ANN, each of them used for different purposes.

The simplest one is the multi-layer perceptron (MLP), also named linear network or fully- connected network. It consists of a concatenation of layers where all the perceptrons in one layer are connected to those in the following one. Figure 1.1 is an example of a three layers fully-connected neural network. It contains two input neurons that are fully connected to the six neurons in the hidden layer, which in turn are all connected to the output neuron.

Besides MLP, in this thesis we have used CNNs and mixture density networks, which are explained in §1.2.1and §1.2.2, respectively.

1.2.1 Convolutional Neural Networks

Convolutional neural networks have proven successful for a lot of image related applications, e.g. image classification (Sultana et al.,2019), image semantic segmentation (Liu et al.,2018),

(20)

1 2 2 4

2 1 3 2

3 2 1 1

3 3 4 2

0 2

1 0

5 5 11

5 8 5

7 5 6

Ⓧ =

Figure 1.2: Example of convolution. The leftmost matrix corresponds to the input image. The middle yellow matrix is the convolutional kernel and the rightmost one is the output image.

5 5 11

5 8 5

7 5 6

8 11

8 6

Figure 1.3: Example of the max-pooling operation. The left matrix is the input image and the right one is after the pooling.

INPUT IMAGE

1 2

0 1

Con

volutional ﬁlters CONVOLUTIONAL

LAYER

POOLING LAYER

2x2 Max pooling

ACTIVATION LAYER

ReLU activation function

BATCH NORMALISATION

Figure 1.4: Example of a CNN composed of a convolutional layer, a pooling layer, a ReLU activation function. The batch normalisation comes after the activation layer.

(21)

and object detection (Zhao et al.,2018). CNNs are a type of ANN composed of convolutional layers. In contrast to a fully-connected ANN, where the input propagates linearly through the network (Eq.1.1), the operation of a convolutional layer essentially consists in sliding the input image with convolutional kernels.

Convolutional kernels are (typically) 4-dimensional matrices of trainable parameters.

When the kernel passes on a grid of pixels (of the same size as the kernel), each pixel is multiplied by the corresponding value in the kernel and the contribution of all pixels in the grid is added to a single number. The convolution over the complete image creates a new representation of the input data.

The left panel of Fig.1.2shows an example of convolution operation. The leftmost matrix represents the input image and the sun-seed yellow centred matrix is the convolutional kernel. The coloured grid on the input image is multiplied by the kernel and summed together, resulting in the bluish matrix value on the rightmost matrix. This procedure is repeated for each 4 × 4 group of pixels. Note that the convolution reduces the input image dimension from 4x4 to 3x3, as a 2x2 kernel can only slide 3 times in each direction over a 4x4 matrix. To avoid the dimensional reduction one can apply padding, which consists in adding extra rows and columns to the input image, enabling one more slide. The added values are commonly filled either with zeros or copying the pixels from the edge of the image.

Multiple convolutional kernels can be applied within a convolutional layer. Every kernel will create a new data feature map focusing on different data traits. Concatenating convolutional layers enables shallow layers, i.e. layers close to the input layer, to learn low-level features (e.g. edges and lines) while deeper layers learn more complicated features (e.g. shapes).

As the number of convolutional layers and kernels per layer increases, so does the number of trainable parameters and the amount of data (and memory) that the network needs to handle.

Convolutional layers enable CNNs to use the local spatial coherence of images (i.e. the fact that spatially close pixels together have a meaning) to reduce the number of operations required to process an image. Furthermore, CNNs also learn from the order of their inputs.

Considering every pixel in one image an input feature, the CNN sees where each of these pixels is located and uses this information to make predictions.

CNNs also use the spatial coherence of images to effectively reduce the dimension of the input features using the so-called pooling layers (Gholamalinezhad & Khosravi, 2020). Pool- ing layers apply any differentiable operation (e.g. the maximum or the average) to reduce a group of pixels in the feature map to a single value. Therefore, these layers down-sample feature maps by creating a smaller representation of each feature map separately. The right panel in Fig.1.3shows an example of 2x2 max-pooling, where each 2x2 pixel grid in the input image is replaced by its maximum value.

Pooling layers also help summarise the presence of features in the input image. Two images of the same object can present slightly different images as a result of e.g. rotation or cropping. After the convolutional layer, these images will result in different feature maps.

(22)

The pooling layer helps to regularise the differences between slightly different feature maps by applying operations over groups of nearby correlated pixels.

The last type of layer we introduce here is the batch normalisation layer (Ioffe & Szegedy, 2015). This layer is particularly helpful when training deep neural networks with lots of hidden layers. It is commonly implemented after the activation function and consists in re-scaling batch by batch the activated output of the previous layer, so that it has zero mean and unit variance (standardise). During the back-propagation process, weights are updated layer by layer. When doing so, we assume the weights in all the other layers are fixed. However, this is not the case since back-propagation iteratively updates all layers in the network, hindering the loss minimisation. Batch normalization helps coordinate the update of the different layers in the model, fastening the convergence and making the learning more robust.

Figure 1.4presents an example of CNN composed of a convolutional layer, a max-pooling layer, the ReLU activation function layer, and batch normalisation. We can visualise that convolving the input image with the convolutional kernel highlights certain parts of the input image and smooths others. The pooling layer remarks even more on the features highlighted in the convolutional layer.

1.2.2 Mixture density networks

So far, the presented networks predict a single value. However, assessing the uncertainty of the predictions is often required for scientific applications. Mixture density networks (MDN, Bishop, 1994) predict the probability distribution of the prediction y given the data D as a weighted sum of k distributions that can be any sort of basis function, e.g. Gaussians functions, in such a way that

p(y|D) =

k

X

i

αiNi(µi, σi) , (1.4)

where N_i(µ_i, σ_i) is the i-th Gaussian component with mean µ and standard deviation σ. The α parameters are the so-called mixing coefficients, which give the relative contribution of each Gaussian component to the total probability distribution.

MDNs combine a neural network with a mixture density model. The neural network, which can be of any type (e.g. CNN, §1.2.1), takes the input data D and converts it into a set of values that are modelled by the mixture model. The mixture model shapes the data using several distributions that can be written in a simple parametric form (e.g. a Gaussian, as in Eq.7.3). Figure 1.5presents a Gaussian MDN. The blue rectangles represent a 3-layers ANN that given two input values outputs the mean and standard deviation of N Gaussians (yellow points), together with the mixing coefficients α (Eq.7.3). These output parameters build the probability distribution of the predicted quantity.

A Gaussian MDN is trained with a loss corresponding to the negative log-likelihood of a

(23)

linear combination of Gaussian distributions, i.e.

L_MDN= − log (p(y|D)) =

k

X

i=1

log(α_i) −(f_i− µ_i)²

σ²_i − 2 log (σ_i)

. (1.5)

This corresponds to maximising the likelihood function L(D|~θ), where ~θ are the ~µ, ~σ, and ~α parameters modelling the Gaussians in Eq.7.3.

1.2.3 Multi-task learning

Deep learning algorithms consist of training a single or an ensemble of models to accurately perform a single task (e.g. predicting the redshift). Multi-task learning (MTL) is a training methodology that aims to improve the performance on a single task by training the model on multiple related tasks simultaneously (Caruana, 1997). A pedagogical example is a network used to classify images of cats and dogs. If the same network is simultaneously trained to classify the shape of the ears, e.g. spiky or rounded, the network will learn correlations between the animal type and the ear shape, e.g. dogs mostly have rounded ears, in such a way that the ear shape predictions will help in the cat-dog classification.

There are two main types of MTL network architectures: soft- and hard-parameter sharing (Zhang & Yang,2021). Hard-parameter sharing architectures are the most common type of MTL and that used in this thesis. This MTL implementation shares a set of hidden layers among tasks, while each task also implements task-specific layers after the shared ones.

Figure 1.6 shows an example of three task hard-parameter sharing MTL, which is built of three shared layers (blue layers) and a single task-specific layer per task (tomato-red layers).

Sharing hidden layers forces the network to learn representations that generalise for all tasks.

Although the example (Fig.1.6) only has a single task-specific layer per task, this is commonly extended to several. On the other hand, in soft-parameter sharing architectures, each task has its model and there are no shared layers. The distance between the parameters of the different models is regularised to keep them similar.

MTL has already been successfully applied to fields such as e.g. video processing where Song et al. (2020) implements MTL to simultaneously predict the edge and the disparity maps in stereo video processing¹. Other example implementations include Moeskops et al.

(2017), where an MTL network is trained to simultaneously segment tissues in brain images, the pectoral muscle in breast images, and the coronary arteries.

1.3 Deep learning in astronomy

Astronomy is experiencing an explosive growth of data as a result of past, current and upcoming surveys (Mickaelian, 2016; Zhang & Zhao, 2015). For example, the Palomar Digital

1Stereo video is the practice of producing the illusion of 3D images in moving form. Disparity maps display the apparent pixel difference between a pair of stereo images, i.e. images of the same taken from different perspectives and edge maps indicate the position of the edges in the image.

(24)

Neural network

[...] 𝛼’s(x)

[...] 𝜇^’s(x)

[...] 𝜎^’s(x)

p(y|x)

Mixture model

Figure 1.5: Mixture density network scheme. The first part represents a neural network extracting features from the input data while the second is a mixture model constructing the output’s probability distribution from the network’s output features.

Shared layers

Task-speciﬁc layers TASK 1

TASK 2

TASK 3

Figure 1.6: Multi-task learning scheme.

This particular example corresponds to a hard parameter sharing architecture, where all tasks share a set of common layers (blue layers). The red layers represent the task-specific layers.

Sky Survey (DPOSS, Djorgovski et al., 1998) and the Two Micron All-Sky Survey (2MASS, Skrutskie et al.,2006) generated 3 TB and 10 TB of data respectively. This already increased to 40 TB for the Sloan Digital Sky Survey (SDSS,Ahumada et al.,2020) and it is expected to rise to 40 PB and 200 PB for The Panoramic Survey Telescope and Rapid Response System (PanSTARRS,Magnier et al.,2020) and the Rubin Observatory Legacy Survey of Space and Time (LSST,Ivezi´c et al.,2019a). The improvement of technology has enabled the construction of larger and more powerful telescopes and cameras contributing to the rapid increment of astronomical data.

Furthermore, astronomical data embraces different data types and complexities, including images, spectra, simulations, and time series. For example, SDSS observed the spectra of millions of galaxies and multi-colour images of one-third of the sky, the Dark Energy Survey (DES, The Dark Energy Survey Collaboration, 2005) imaged 5000 deg² of the southern sky (∼300 million galaxies) in five optical filters, and the Kilo-Degree Survey (KiDS, de Jong et al., 2013), imaged two areas of 750 deg² in four optical filters and five near-infrared bands.

Furthermore, other surveys such as e.g Gaia (Gaia Collaboration, 2018) also produce multi- temporal data. Gaia is accurately mapping the Milky Way measuring the motion of each star around the centre of the galaxy.

The increasing amount of astronomical data to analyse has fostered the implementation of data-driven tools to address astronomical data analysis. Furthermore, future surveys like LSST, the Dark Energy Spectroscopic Instrument (DESI, DESI Collaboration et al., 2016), and Euclid (Laureijs et al., 2011) will increase the number of observed astronomical objects by more than an order of magnitude, enhancing the need of fast, automated tools to process all the data. While training deep learning models can be time-consuming, evaluating them on data is a fast operation.

(25)

There are many different examples of deep learning implementations in astronomy. Tra- ditionally, these worked at a catalogue level, but recently with the development of very powerful CNN, there has been an increasing interest in implementations at the image level. This opens a new research path full of possibilities such as e.g. automated classification. Training a deep learning model to classify astronomical objects from the images enables an automated real-time object classification (Narayan et al., 2018), also allowing a rapid follow-up of rare phenomena.

In this thesis, we have developed a deep learning end-to-end pipeline to measure the photometry and the photometric redshift of galaxies directly from the images. This enables a fast evaluation of both quantities, reducing the number of processing steps and exploiting the information available in the images.

1.3.1 Object classification

One of the most studied deep learning applications in astronomy is object classification, e.g.

star-galaxy and galaxy morphology classification. Most traditional star-galaxy classifiers use summary information from catalogues. In Ball et al.(2006), SDSS colours (i.e. u − g, g − r, r − i, and i − z) are used to provide a classification for all 143 million photometric objects in the SDSS-DR3. Also, inCabayol et al. (2019) objects from the COSMOS field are classified based on 40 narrow-band colours using a 1D CNN. Baqui et al. (2021) tests six different machine learning algorithms (e.g. K-nearest neighbours and decision trees) to distinguish stars and galaxies using 56 narrow-band filters and 4 ugri broad-band filters. Approaches addressing the classification at the image level use both the photometry encoded in the image and the morphology of the source to predict the object type. One example is Kim & Brun- ner(2016), where a CNN is trained on SDSS images in five photometric bands ugriz to r < 23.

Traditionally, galaxy classification has only relied on galaxy morphology. The common galaxy classification scheme was proposed by Hubble in 1936 and splits the galaxies into four broad types based on their morphology: elliptical, spiral, barred-spiral, and irregular, each of these classes containing several sub-classes (Hubble, 1922; Hubble, 1926; Hubble, 1927;

Hubble & Tolman, 1935). For most of the 20th century, galaxy classification was tackled by visual inspection of a group of astronomers, (e.g.de Vaucouleurs et al., 1991). With modern surveys data, visual inspection of the entire catalogue is infeasible due to the large number of observed galaxies. Moreover, to quantify the error in the classification, galaxies require multiple independent classifications. Galaxy Zoo was created to solve this problem. It is a crowd-sourcing project to classify more than 60 million SDSS galaxies based on online citizen visual inspection (Lintott et al., 2008).

CNN offer an alternative to visual inspection (Khalifa et al., 2017; Dom´ınguez S´anchez et al., 2018). Working directly on galaxy images, these networks can provide a morphological classification for millions of objects using all the information available in the image (e.g.

morphology, photometry, and environment). The classification of bright galaxies (Zhu et al., 2019;Goddard & Shamir,2020) is addressed with annotated data sets like Galaxy Zoo 2 (Wil- lett et al., 2013) or catalogues with reliable known galaxy morphologies e.g. the Principal

(26)

Galaxy Catalogue (Paturel et al., 2003) or the Value-Added Galaxy Catalogue (Choi et al., 2010). Furthermore, CNNs also offer a solution for galaxy classification of deep data sets, where the faintest galaxies are hardly distinguished from the background and visual morphological inspection is not a possibility. This could potentially be addressed using galaxy image simulations to train the CNN, although there are not many examples in the literature yet.

1.3.2 Photometric redshift estimation

Machine learning has also been extensively applied to photometric redshift (photo-z) estimation and present an alternative to template based spectral energy distribution (SED) fitting methods (e.g. LePhare, Arnouts & Ilbert 2011; BPz, Ben´ıtez 2011; ZEBRA, Feldmann et al.

2006; EAZY, Brammer et al. 2008). A vast variety of machine learning algorithms has been used to tackle photo-z estimation like tree-based methods (e.g. Carliles et al. 2010; Gerdes et al. 2010;Carrasco Kind & Brunner 2013), support vector machines (SVM, e.g. Wadadekar 2005; Wang et al. 2008) and fully-connected ANN (e.g. Collister & Lahav 2004; Bonnett 2015a), the majority of them using photometric features to make redshift predictions.

Recently, efforts have also focused on determining photometric redshifts directly from astronomical images using CNN. D’Isanto & Polsterer (2018) compares the performance of traditional redshift estimation methods using photometric features with a deep CNN predicting photo-z from astronomical images. The paper presents a redshift precision comparable to the state of the art results on bright galaxies. Furthermore,Pasquet et al.(2019) determines the photometric redshifts of bright galaxies in the Main Galaxy Sample of the Sloan Digital Sky Survey at z < 0.4 with a CNN on the ugriz images. In this thesis, we present a novel deep learning method to predict multi-band photometry and photo-z from images (§7).

1.3.3 Other applications

A decade ago, almost all the machine learning implementation examples would have related to object classification and photo-z estimation. Nowadays, the hype on machine learning has also reached astronomy, and machine learning implementations have been widespread in several other science cases.

Examples include galaxy deblending, which will become a crucial step in the data reduction for upcoming surveys like e.g. LSST. Traditionally, deblenders mostly relied on analytical modelling of the blended galaxies, which requires very accurate galaxy models.

Recently, more robust deep learning deblending algorithms have also been developed (Bou- caud et al., 2020;Arcelin et al., 2021). Neural networks have also been developed to correct shear measurements from nuisance effects (Tewes et al.,2019; Matilla et al.,2020), including e.g. instrument optics, blending, and unknown galaxy morphologies. Furthermore, Gupta et al. (2018) uses CNNs to derive cosmological constraints from weak lensing maps.

(27)

Chapter 2 The PAU Survey

In this chapter, we introduce galaxy surveys (§2.1), focusing on imaging surveys (§2.1.1) to introduce the PAU Survey (§2.2).

2.1 Galaxy surveys

A large fraction of the data collected from the universe arrives as electromagnetic radiation, e.g. low energy radio photons (Wilson, 2011; Lacy et al., 2020), very energetic gamma rays (Di Sciascio, 2019;Mazin,2019), or optical astronomy (The Dark Energy Survey Collabora- tion,2005;Mart´ı et al.,2014;Benitez et al.,2014). Galaxy surveys are surveys of a portion of the sky that provide fundamental data basis of galaxies and their distribution in the Universe.

There are two widely used techniques to observe the Universe: spectroscopy and photometry. Spectrographs split the light in wavelength such that it is possible to measure the amount of light in small wavelength intervals. Spectroscopy measures the spectral energy distribution (SED), i.e. the amount of energy per second, unit area, and unit wavelength of any astronomical source using a spectrograph, which enables the estimation of very precise galaxy redshifts. In contrast, imaging surveys consist of imaging the sky using optical and near infra-red (NIR) photometric filters, which enables increasing the number of observed galaxies by ∼ 2 orders of magnitude.

Ideally, galaxy surveys should cover wide sky areas with a fine angular and wavelength resolution. Unfortunately, astronomical observations are limited and a high wavelength resolution is commonly at expense of a fine angular resolution over a wide sky area (and vice versa). While spectroscopic surveys (e.g.Ahumada et al.,2020;Scodeggio et al.,2018;Driver et al., 2011) can provide very high-resolution spectra, they demand long exposure times.

Moreover, spectroscopic surveys also require targeting the observations, which potentially causes target-selection effects due to e.g the surface brightness detection limit of the imaging data used to derive the targets. In contrast, imaging photometric surveys (e.g.de Jong et al., 2013;Ivezi´c et al.,2019a) present an alternative method that enables covering larger areas of the sky with better angular resolution but worsening significantly the wavelength resolution.

(28)

Advances in observational technology (i.e. telescopes and detectors) have enabled galaxy surveys to increase the collected data from very few galaxies to billions of them. Optical imaging sky surveys started in the pre-photography era with naked-eye observations. The first astronomical catalogue was set up by Messier in 1774 (Messier,1774) and contained 110 astronomical objects. Other examples of pre-photography catalogues are e.g. the still-used New General Catalogue (Dreyer, 1888) and the Index Catalogue (Dreyer, 1895) by John Dreyer .

Photography and monitoring systems transformed sky surveys enabling a systematic coverage of large areas of the sky. In the first half of the 20th century, several sky surveys provided astronomical catalogues containing ∼ thousands of objects, mostly stars. Some examples are the Smithsonian Astrophysical Observatory Catalog¹, which contained positions, proper motions, and magnitudes for over 250 000 stars and the Henry Draper Catalogue², containing the spectral type of ∼ 360 000 stars. Photography also enabled the monitoring of the Magellanic Clouds with the discovery of the crucial period-luminosity relations for Cepheids (Leavitt & Pickering, 1912) in 1912, later used for the Hubble discovery of the Universe expansion.

In the second half of the century, the development of Schmidt telescopes led to the POSS-I survey, a major milestone for galaxy surveys (Minkowski & Abell, 1963). POSS-I mapped about two-thirds of the observable sky from the Palomar Mountain providing catalogues such as the Morphological Catalog of Galaxies³, of ∼ 30 000 galaxies. Furthermore, the first spectroscopic surveys were designed in the early eighties and provided the first evidence of the large scale structure in the nearby universe. The first Center for Astrophysics redshift survey (CfA,Geller & Huchra, 1983) was the first spectroscopic survey, which observed ∼ 2300 galaxy spectra down to m_AB ∼14.5.

The development of charged-coupled devices (CCD) brought unprecedented progress to astronomy with fully-digital sky surveys. CCDs are silicon chips made up of an array of light- sensitive diodes (pixels) settled in rows and columns that become charged when light hits them (Lesser, 2015). SDSS (Gunn et al., 1998; York et al., 2000) was the first CCD survey, which eventually covered 14 500 deg² and collected 116 TB of data (Alam et al.,2015). SDSS was fundamental in transforming astronomy and enabled a wide range of science applications.

Technology advances also affected spectroscopic surveys with the development of multi-fibre spectrographs, opening to massive redshift surveys, e.g. 2dF (Colless et al.,2001a) and SDSS (Ahumada et al., 2020), which together provided more than a million galaxy redshifts.

2.1.1 Imaging surveys

The wavelength resolution of imaging surveys depends on the set of photometric filters (i.e.

the photometric system). The photometric system is characterised by the number, width,

1https://heasarc.gsfc.nasa.gov/W3Browse/star-catalog/sao.html

2http://server6.sky-map.org/group?id=23

3https://heasarc.gsfc.nasa.gov/W3Browse/galaxy-catalog/mcg.html

(29)

4000 5000 6000 7000 8000 9000

(Å)

20 40 60 80 100 120 140

f (1 0

17

er g/ s/c m

2

/Å )

4000 5000 6000 7000 8000 9000

(Å)

0.0 0.1 0.2 0.3 0.4 0.5

Transmission

gr iz

400 500 600 700 800 900

(Å) 0.0

0.1 0.2 0.3 0.4 0.5 0.6

Transmission

Figure 2.1: Top left: SDSS galaxy spectra. Top right: The SLOAN SDSS griz broad-band transmission curves. Bottom: The PAUS narrow-band transmission curves.

and wavelength coverage of the photometric filters. This includes broad-band systems with few photometric filters of width ∼ 1000˚A (e.g. Honscheid & DePoy, 2008; Doi et al., 2010) and narrow-band systems (Molino et al., 2013; Padilla et al., 2016), which are made of a larger number of narrower photometric filters with ∼100˚A width.

While broad-band photometric systems provide low spectral resolution, they enable observing large sky areas with great angular resolution. On the other hand, photometric systems with narrow-band filters increase the wavelength resolution but typically cover smaller sky areas since the survey needs to pass more times through the same sky region to cover the same wavelength range. Furthermore, narrow-band filters detect fewer photons than their broader counterparts for the same exposure time. This either yields in a signal-to-noise reduction, an increment of the exposure times required to observe a sufficient signal, or a trade-off between the two.

Figure 2.1 shows an example of galaxy spectra (upper-left) measured by SDSS. The right panel of the same figure presents the SDSS griz filter transmission curves (right) (Fukugita et al., 1996; Doi et al., 2010), while the bottom plot shows the PAUS narrow-band transmission curves (Casas et al., 2012). These plots evidence that broad-band imaging suffers a significant loss of wavelength resolution while narrow-band wavelength resolution is in- between broad-band imaging surveys and spectroscopy. Table 2.2.4shows a few examples of spectroscopic and photometric surveys and their characteristics.

(30)

Astronomical images

The images captured by the CCD camera are named raw images. These are the primary source of data but are highly degraded by noise effects such as the turbulence of the atmosphere, charge inductions in the CCD electronics, and the telescope movement (Morganson et al., 2018). Other sources of noise such as e.g cosmic rays, very bright stars, and very massive galaxies can also affect the quality of the images.

Images are observed in a field of view, which defines the area of the sky that can be covered by the astronomical image. In ground-based telescopes, the atmosphere also affects the image by smearing out the light in a process named seeing (Trujillo et al., 2001). The seeing reduces the resolution of an astronomical image, lowering its mean surface brightness and increasing the observed radii. This is made evident in the image of a point-like source that should be captured by a single pixel, e.g. a distant star, spreading over a group of pixels.

The diffraction of the lens aperture and the fact that images are taken with discrete pixels (pixelisation) also contribute to the image spreading. The point spread function (PSF) quantifies the combination of all these effects (seeing, pixelisation, and telescope optics). The observed image is the galaxy image convolved with the PSF, which keeps the brightness of an object while spreading it on a larger group of pixels. Mathematically, the value of a pixel I˜_x,y placed at x and y convolved with the PSF is

I˜_x,y =

a

X

i=−a b

X

j=−b

K_i,j× I_x+i,y+j, (2.1)

where I is the galaxy image without PSF effects and KKK is the PSF kernel, which is assumed to have dimensions (2a + 1, 2b + 1). The PSF can be defined by any mathematical function, e.g. a Gaussian. In astronomy, the PSF is most commonly modelled by the Moffat function, which is less sharp than a Gaussian (Eq.3.15)

2.2 The PAU Survey

Large maps of galaxy tracers are a key ingredient for many cosmological studies. The Physics of the Accelerating Universe Survey (PAUS) is a 40 narrow-band imaging survey observing at the William Herschel Telescope, in La Palma (Spain). As of June 2022, PAUS has imaged 40 deg² of the sky. The observed astronomical fields are the COSMOS field (2 deg²) and a fraction of CHFTLS wide fields W1 (10 deg²), W2 (10 deg²) and W3 (20 deg²). Moreover, PAUS is also targeting the CHFTLS-W4 field, for which it still has very few observations.

2.2.1 History

Back in 1998, two independent teams of astronomers found that the Universe’s expansion was speeding up (Riess et al., 1998; Perlmutter et al., 1999). This discovery had strong im- plications for the understanding of the Universe. Dark energy was postulated as the most

Precise photometry and photo-z s with multi narrow-band data and deep learning

Precise photometry and photo-z s with multi narrow-band data and deep learning

Laura Cabayol Garc´ıa

Programa de Doctorat en F´ısica Universitat Aut` onoma de Barcelona Director: Dr. Martin Børstad Eriksen

Tutor: Dr. Manuel Delfino Reznicek

Intitut de F´ısica de Altes Energies

A thesis submitted for the degree of Philosophae Doctor (PhD)

June 2022

Contents

I Concepts 4

II Galaxy photometry and photo-zs with deep learning 43

Introduction

Part I

Concepts

Chapter 1

Deep learning background

1.1 Gentle introduction to machine learning

1.2 Implementation of deep learning algorithms

1.2.1 Convolutional Neural Networks

5 5 11

5 8 5

7 5 6

8 11

8 6

1.2.2 Mixture density networks

1.2.3 Multi-task learning

1.3 Deep learning in astronomy

1.3.1 Object classification

1.3.2 Photometric redshift estimation

1.3.3 Other applications

Chapter 2

The PAU Survey

2.1 Galaxy surveys

2.1.1 Imaging surveys

(Å)

f (1 0

er g/ s/c m

/Å )

(Å)

Transmission

2.2 The PAU Survey

2.2.1 History