Machine Learning Lecture Notes

Machine Learning Lecture Notes-1[KCL Financial Mathematics]

1 Introduction

1 Introduction

1.1 Machine Learning v.s Statistics

Definition

Machine Learning : a field that takes an algorithmic approach to data analysis, processing and prediction.
Algorithmic :produces good predictions or extracts useful information from data to solve a practical problem

Venn diagram: Staticticsdata scienceAI

Figure 1 Venn Diagram

1.2 Applications

Supervised Learning

Definition:
Automate decision-making processes by generalising from input-output pairs $x_i , y_i )$ , $i \in$ { $1, . . . N$ } for some $N \in N$
Drawback
Creating a dataset of inputs and outputs is often a laborious manual process.
Advantage
Supervised learning algorithms are well-understood and their performance is easy to measure
Example(train tickets pricing by distance)

Unsupervised Learning

Definition
Only the input data is known, and no known output data is given to the algorithm.
Drawback
Harder to understand and evaluate than SL.
Advantage
Only the input data is needed and there is no process of “creating input-output pairs” involved.
Example(trade portfolio)

Data & Tasks & Algorithms

data(explain with case)
- data point
- feature
- feature extraction/engineering
task(figure: overview of ml tasks)
- regression
- classification
- clustering
- dimensonality reduction
< Different tasks have different loss functions, refer to 1.3–Function >
algorithm
- supportvectormachines(SVMs)
- nearestneighbours
- random forest
- k means
- matrix factorisaion/autoencoder
- …

1.3 Deep Learning

Definition(supervised & unsupervised)

Deep learning solves Problems by employing neural networks(artificial neural networks), functions constructed by composing alternatingly affine and (simple) non-linear functions.

Task

prediction
classification
image recognition
speech recognition and synthesis
simulation
optimal decision making
…

Applications in Finance

detect fraud
machine read cheques
perform credit scoring

Limits

limited explainability of deep learning
black-box nature of neural networks

Function

$f =(f_1,...,f_O):R^I →R^O$

inputs
$x_1,...,x_I$ ( $I \in N$ )
outputs
$f_1(x_1,...,x_I),..., f_O(x_1,...,x_I )$ ( $O \in N$ )
loss function
$L(f):=frac{1}{I}sum_{i=1}^Il(hat{f_i},f_i)$ (e.g. squared loss, absolute loss)

1) Example

Regression Problem: data fitting
Binary Classification Problem:the direction of next price change//credit risk analysis

2) A Class of Functions Construction

rich in the sense that it encompasses “almost any” reasonable functional relationship between the outputs and inputs;
parameterised by a finite set of parameters,so that we can actually work with it numerically;
able to cope with high-dimensional inputs and outputs.

3) Optimal $f$ Selection

implementable numerically;
efficient enough to be able to cope with large numbers of samples;
able to avoid the pitfall of overfitting, that is, producing a function $f$ that performs well with the training data but poorly with other data.

4) Functions in Deep Learning

Function: Affine FunctionActivation Function
$f =σ_r ◦L_r ◦···◦σ_1◦L_1:R^I →R^O$
where
$x =(x_1,...,x_{d_i} )∈R_{d_i}$ ;
$L_i:R^{d_i-1} → R^{d_i}$ , $\forall$ $i \in N$ is an affine function, transmiting $d_{i−1}$ signals to $d_i$ units or neurons;
$σ_i(x):=(σ_i(x_1),...,σ_i(x_{d_i} )):R→R$ is an activation function, transforming $d_{i−1}$ siginals.

< satisfy the 2) requirement>
< Since there is no specific data definition for the model, it’s super general to fit diverse data.>
Optimal $f$ :stochastic gradient descent (SGD)
- the matrices and vectors parameterise its layers
- a randomly drawn subset of samples(minibatch) is used
- the gradient is computed using a form of algorithmic differentiation(backpropagation)
Loss function
generally absolute value of residual,but some others in particular

History of Deep Learning

Heaviside function< problem sheet1>
$f(x;w,b) := H(w^′x +b),$ where
$\[2ex] 1, quad xgeq0 end{cases}$
Artificial Neuron case

- 1: if neuron A is activated, then neuron C gets activated as well (since it receives two input signals from neuron A); but if neuron A is off, then neuron C is off as well.
- 2: (logical AND) Neuron C is activated only when both neurons A and B are activated (a single input signal is not enough to activate neuron C).
- 3: (logical OR) Neuron C gets activated if either neuron A or neuron B is activated (or both).
- 4: (logical NOT) Neuron C is activated only if neuron A is active and neuron B is off. If A is active all the time, then neuron C is active when neuron B is off, and vice versa.