Deep Learning

Table of Contents

1 Introduction

Notes on my studies on Deep Learning techniques

2 Classes of Artificial Neural Networks

2.1 Recurrent Neural Networks - RNNs

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. source: Wikipedia

2.2 Convolutional Neural Networks - CNN

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. source: Wikipedia

2.3 Generative Adversarial Neural Networks - GANs

A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in 2014.[1] Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss). source: Wikipedia

3 Basic concepts

3.1 Binary classification

  • Use either 0 (cat) or 1 (non cat)

Given \((x, y) \:\vert\: x \in \mathbb{R}^{n_{x}}\), with m train examples: \(\{(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}),...(x^{(m)}, y^{(m)})\}\)

\begin{equation} X = \begin{bmatrix} \vert & \vert & \vert & \vert \\ x^{(1)} & x^{(2)} & \dots & x^{(m)} \\ \vert & \vert & \vert & \vert \end{bmatrix}_{m \times n} x \in \mathbb{R}^{n_{x}} \end{equation}
\begin{equation} Y = \begin{bmatrix} y^{(1)} & y^{(2)} & \dots & y^{(m)} \end{bmatrix}_{m \times 1} Y \in \mathbb{R}^{1 \times m} \end{equation}

3.2 Logistic regression

3.2.1 Model

Given \(x\), want \(\hat{y} = P(y=1 \vert x), x \in \mathbb{R}^{n_{x}}\)

parameters: \(w \in x \mathbb{R}^{n_{x}}, b \in \mathbb{R}\)

output: \(\hat{y} = \sigma(w^{T}x + b)\), where \(\sigma(z) = \frac{1}{1 + e^{-z}}\)

In some implementation it is add \(x_{0} =1\), therefore \(x \in \mathbb{R}^{n_{x}+1}\) and:

\begin{equation} \mathbf{\Theta}= \begin{bmatrix} \theta_{1} \\ \theta_{2} \\ \vdots \\ \theta_{n_{x}} \end{bmatrix}= \begin{bmatrix} b \\ w_{1} \\ \vdots \\ w_{n_{x}} \end{bmatrix} \end{equation}

3.2.2 Gradient Descent

output: \(\hat{y} = \sigma(w^{T}x + b)\), where \(\sigma(z) = \frac{1}{1 + e^{-z}}\)

Given \({(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)})}\), want \(\hat{y}^{(i)} \approx y^{(i)}\)

Loss (error) funcion:

  1. Quadratic error (not used)
    \begin{equation} \mathcal{L}(\hat{y}, y) = \frac{1}{2}(\hat{y} - y)^2 \end{equation}
  2. Loss function
    \begin{equation} \mathcal{L}(\hat{y}, y) = -(y\log{\hat{y}} + (1 - y)\log(1 - \hat{y})) \end{equation}
  3. Cost function definition
    \begin{equation} \begin{split} \mathbf{J}(w, b) & = \frac{1}{m}\sum_{i=1}^{m}\mathcal{L}(\hat{y}^{(i)},y^{(i)})\\ & =-\frac{1}{m}\sum_{i=1}^{m}\Big[y^{(i)}\log{\hat{y}^{(i)}} + (1 - y^{(i)})\log(1-\hat{y}^{(i)})\Big] \end{split} \end{equation}
  4. Objective

    Find w, b that minimizes \(\mathbf{J}(w, b)\)

    jw.png

  5. Learning

Date: 2020-10-30 sex 04:51

Author: Marcos Moritz

Created: 2020-12-01 ter 12:31

Validate