A tutorial on supervised learning from the perspective of mathematical optimization: a thesis in Computer Science

Justin Lovinger

doi:10.62791/19994

Back

Thesis

Open access

A tutorial on supervised learning from the perspective of mathematical optimization: a thesis in Computer Science

Justin Lovinger

Master of Science (MS), University of Massachusetts Dartmouth

2018

DOI:

https://doi.org/10.62791/19994

Abstract

Mathematical optimization -- Data processing

Data mining.

Popular methods in supervised learning, from regression and neural networks to sup-port vector machines, are commonly presented from the perspective of statistics or biology. Instead, we present common techniques in supervised learning as applications of mathematical optimization and examine the practical benefits this perspective brings.Under the optimization perspective, linear regression is understood as the function f(X) = XW + [vector]b that is trained by solving arg min [subscript] W, [vector]b ξ(f(X),Y ), where W is a weight matrix, X is a matrix of training arguments, Y is a matrix of training targets, and ξ is an error function such as mean squared error. Similarly, a multilayer perceptron neural network is understood as a function of the form f(X) = t[subscript]n−1(· · ·(t2(t1(XW1 + [vector]b)W2)· · ·)W[subscript]n−1), where tᵢ is the i[superscript]th transfer function, Wᵢ is the i[superscript]th weight matrix, and n is the number of layers. Under the optimization perspective, training a multilayer perceptron is the same as training a regression model: solve arg min[subscript]W1,W2,...,Wn−1,[vector]bξ(f(X),Y ). Mathematical optimization serves as the workhorse for training by solving the arg min problem. Powerful optimization methods such as Broyden-Fletcher-Goldfarb-Shanno (BFGS) and its limited-memory L-BFGS variant can efficiently solve this problem. The inclusion of line search or trust region techniques removes the need for hand tuned learning rates and drastically improve performance and consistency. Through the rapid training enabled by efficient optimization, more complex models can be applied and larger datasets learned. Bigger data, faster real time learning, and more effective image recognition are possible. From the optimization perspective, explanation of models is simplified and implementation is naturally modular and flexible. Optimization techniques are easily reused between models. The development of a new generalization improving error function is easily propagated to existing and future models. Datasets are better learned and accuracy improved by easily applying, developing, and testing a multitude of models. When supervised learning is performed from the optimization perspective, an equation with adjustable parameters or an error function is all that is necessary to implement a new model and better solve problems in data science and machine learning.

Files and links (1)

pdf

Lovinger J. COE MS Thesis 2018977.15 kBDownload View

CC BY-NC-ND V4.0, Open Access

Metrics

2 File views/ downloads

12 Record Views

Details

Title: A tutorial on supervised learning from the perspective of mathematical optimization
Creators: Justin Lovinger
ORCID: 0000-0002-3759-6018
Contributors: Iren Todorova Valova (Advisor) - University of Massachusetts Dartmouth, College of Engineering
Xiaoqin Zhang (Committee Member) - University of Massachusetts Dartmouth
Ming Shao (Committee Member) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Number of pages: x, 160 pages
Illustrations: illustrations
Table of contents: List of figures -- List of tables -- List of algorithms -- Chapter 1 : Introduction -- Chapter 2 : Gradient optimization. Step direction ; Line search -- Chapter 3 : Derivative-free and non-smooth optimization. Genetic algorithm ; Population-based incremental learning ; Gravitational search algorithm -- Chapter 4 : Supervised learning. Regression ; Multilayer perceptron ; Radial basis function network ; Error functions ; Support vector machine ; Non-traditional optimization and decision trees -- Chapter 5 : Supervised learning comparison and results. Datasets ; Optimizer comparison ; Model comparison -- Chapter 6 : Conclusion -- Appendix a notation -- Appendix b raw results -- Bibliography.
References: Includes bibliographical references (pages 137-160).
Awarding Institution: University of Massachusetts Dartmouth
Degree Awarded: Master of Science (MS)
Degree in: Computer Science
Academic Unit: Department of Computer and Information Science
Language: English
Resource Type: Thesis
DOI: https://doi.org/10.62791/19994
Record Identifier: 9914424790501301