• No se han encontrado resultados

Unsupervised Learning

N/A
N/A
Protected

Academic year: 2022

Share "Unsupervised Learning"

Copied!
14
0
0

Texto completo

(1)

Introduction to

Unsupervised Learning

Dalya Baron (Tel Aviv University)

XXX Winter School, November 2018

(2)

What is the difference between

supervised and unsupervised learning?

(3)

What is the difference between

supervised and unsupervised learning?

figure of merit

(4)

What is the difference between

supervised and unsupervised learning?

figure of merit

Supervised Learning Unsupervised Learning

Input: a list of objects with measured properties and labels.

Input: a list of objects with measured properties.

The algorithm is optimizing a score (cost function) that depends on the input labels and predicted labels.

Prior knowledge is required!

The algorithm detects clusters, complex relations, outliers, or reduces the dimensions of the dataset.

Prior knowledge isn’t required!

(5)

Machine-assisted Knowledge Discovery

New technology

allows us to make new discoveries

The gamma-ray sky by Fermi

The Vela 5B satellite in low-Earth orbit (LANL)

Abbott et al. 2017 Soares-Santos

et al. 2017 Hubble Deep Field

ALMA view of protoplanetary disk, D. Fedele et al.

Gaia and runaway stars (ESA)

(6)

Anatomy of unsupervised learning algorithms

(7)

Anatomy of unsupervised learning algorithms

Input dataset:

• Raw data (spectra, images, light-curves).

• Extracted features.

• Measured relations between different objects (distances, correlations).

(8)

Anatomy of unsupervised learning algorithms

Input dataset:

• Raw data (spectra, images, light-curves).

• Extracted features.

• Measured relations between different objects (distances, correlations).

Hyperparameters

• Tuning parameters of the algorithm.

• Can strongly affect the result.

• Traditionally, cannot be optimized for.

(9)

Anatomy of unsupervised learning algorithms

Input dataset:

• Raw data (spectra, images, light-curves).

• Extracted features.

• Measured relations between different objects (distances, correlations).

Hyperparameters

• Tuning parameters of the algorithm.

• Can strongly affect the result.

• Traditionally, cannot be optimized for.

Internal choices and/or internal cost function

• Usually, we cannot control these.

• Strongly affect the result, and define the range of possible outputs.

(10)

Anatomy of unsupervised learning algorithms

Input dataset:

• Raw data (spectra, images, light-curves).

• Extracted features.

• Measured relations between different objects (distances, correlations).

Hyperparameters

• Tuning parameters of the algorithm.

• Can strongly affect the result.

• Traditionally, cannot be optimized for.

Internal choices and/or internal cost function

• Usually, we cannot control these.

• Strongly affect the result, and define the range of possible outputs.

Algorithm output: clusters, high- dimensional relations, outliers, sparse representation.

Our goal:

exploration

or

inference

(11)

Good Practices

Start simple:

Simulate simple low-dimensional dataset, without noise, where the output can be anticipated.

Compare the output of the algorithm for different data

representations and different choices of hyper-parameters.

Gradually complicate the model:

• Add more dimensions (some of them should be uninformative).

Add noise.

• Compare the output for different representations and hyper- parameters.

Physically-motivated model:

• Simulate a physically-motivated dataset.

• Experiment with different noise properties, different representations, and hyper-parameters.

Try to break the algorithm.

(12)

Topics

Clustering algorithms:

K-means

• Hierarchical Clustering

• Gaussian Mixture model Ensemble methods:

• Supervised decision trees and random forests

• Unsupervised random forests

Dimensionality Reduction algorithms:

• PCA and variants

• tSNE and UMAP

• Autoencoders and SOM Outlier detection

Advanced topics and summary

(13)

Tutorials

(14)

Questions?

Referencias

Documento similar

In the preparation of this report, the Venice Commission has relied on the comments of its rapporteurs; its recently adopted Report on Respect for Democracy, Human Rights and the Rule

Díaz Soto has raised the point about banning religious garb in the ―public space.‖ He states, ―for example, in most Spanish public Universities, there is a Catholic chapel

To do that, we have introduced, for both the case of trivial and non-trivial ’t Hooft non-abelian flux, the back- ground symmetric gauge: the gauge in which the stable SU (N )

The expression ((having become common since the spring)) -the French original reads ((etant devenues populaires des le printems»- refers to the spring of 1708 and is

Results of (a) Electrical Conductivity (EC), (b) Redox potential (Eh), (c) Oxygen concentration, (d) Carbon dioxide concentration measurements with no soil in experiment containers

The Dwellers in the Garden of Allah 109... The Dwellers in the Garden of Allah

Thus, to know the effects of clusters in the local economies, the analysis dimensions are the following seven: (1) Agglomeration economies, (2) Knowledge spillovers, (3) Increased

Ribeiro (5) identified the Moráis and Braganca complexes as autochthonous and similar to Spanish ones, for which Badham and Williams (1) developed an ophiolitic model..