Category Archives: Math and Statistics for Data Science, and Engineering

Math and Statistics for Data Science, and Engineering

Misc. Math. Data Science. Machine Learning. Optimization. Vector, PCA, Basis, Covariance

Orthonormality: Orthonormal Vectors

"In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal and unit vectors. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of unit length. An orthonormal set which forms a basis is called an orthonormal basis."
https://en.wikipedia.org/wiki/Orthonormality

Basis for a Vector Space
"A vector space's basis is a subset of vectors within the space that are linearly independent and span the space. A basis is linearly independent because the vectors in it cannot be defined as a linear combination of any of the other vectors in the basis."

https://study.com/academy/lesson/finding-the-basis-of-a-vector-space.html

Vector Space
"In linear algebra, you might find yourself working with a set of vectors. When the operations of scalar multiplication and vector addition hold for a set of vectors, we call it a vector space."
https://study.com/academy/lesson/finding-the-basis-of-a-vector-space.html

Explain the concept of covariance matrices based on the shape of data.

Variance:

covariance captures: "The diagonal spread of the data is captured by the covariance."

"The covariance matrix defines the shape of the data. Diagonal spread is captured by the covariance, while axis-aligned spread is captured by the variance."

https://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/

https://www.cs.rutgers.edu/~elgammal/classes/cs536/lectures/i2ml-chap6.pdf

https://pathmind.com/wiki/eigenvector

How to derive variance-covariance matrix of coefficients in linear regression

https://stats.stackexchange.com/questions/68151/how-to-derive-variance-covariance-matrix-of-coefficients-in-linear-regression

"The matrix $\operatorname {K} _{\mathbf {YX} }\operatorname {K} _{\mathbf {XX} }^{-1}$ is known as the matrix of regression coefficients, while in linear algebra $\operatorname {K} _{\mathbf {Y|X} }$ is the Schur complement of $\operatorname {K} _{\mathbf {XX} }$ in $\mathbf {\Sigma }$ .
The matrix of regression coefficients may often be given in transpose form, $\operatorname {K} _{\mathbf {XX} }^{-1}\operatorname {K} _{\mathbf {XY} }$ , suitable for post-multiplying a row vector of explanatory variables $\mathbf {X} ^{\rm {T}}$ rather than pre-multiplying a column vector ${\mathbf {X}}$ . In this form they correspond to the coefficients obtained by inverting the matrix of the normal equations of ordinary least squares (OLS)."
https://en.wikipedia.org/wiki/Covariance_matrix

Statistics 512: Applied Linear Models Topic 3

https://www.stat.purdue.edu/~boli/stat512/lectures/topic3.pdf

*** *** ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

If you want to contribute to occasional free and/or low cost online/offline training or charitable/non-profit work in the education/health/social service sector, you can financially contribute to: safoundation at salearningschool.com using Paypal or Credit Card (on http://sitestree.com/training/enrol/index.php?id=114 ).

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Misc Math, Data Science, Machine Learning, PCA, FA

January 29, 2020 Sayed

"In mathematics, a set B of elements (vectors) in a vector space V is called a basis, if every element of V may be written in a unique way as a (finite) linear combination of elements of B. The coefficients of this linear combination are referred to as components or coordinates on B of the vector. The elements of a basis are called basis vectors."

Equivalently B is a basis if its elements are linearly independent and every element of V is a linear combination of elements of B.[1] In more general terms, a basis is a linearly independent spanning set.

A vector space can have several bases; however all the bases have the same number of elements, called the dimension of the vector space.

https://en.wikipedia.org/wiki/Basis_(linear_algebra)

Positive Semidefinite Matrix
"A positive semidefinite matrix is a Hermitian matrix all of whose eigenvalues are nonnegative. SEE ALSO: Negative Definite Matrix, Negative Semidefinite Matrix, Positive Definite Matrix, Positive Eigenvalued Matrix, Positive Matrix."

http://mathworld.wolfram.com/PositiveSemidefiniteMatrix.html

Hermitian Matrix

A square matrix is called Hermitian if it is self-adjoint. Therefore, a Hermitian matrix is defined as one for which

(1)

where denotes the conjugate transpose. This is equivalent to the condition

http://mathworld.wolfram.com/HermitianMatrix.html

Definiteness of a matrix

"In linear algebra, a symmetric $n\times n$ real matrix $M$ is said to be positive definite if the scalar $z^{\textsf {T}}Mz$ is strictly positive for every non-zero column vector $z$ of $n$ real numbers. Here $z^{\textsf {T}}$ denotes the transpose of $z$ .[1] When interpreting $Mz$ as the output of an operator, $M$ , that is acting on an input, $z$ , the property of positive definiteness implies that the output always has a positive inner product with the input, as often observed in physical processes."
https://en.wikipedia.org/wiki/Definiteness_of_a_matrix

Singular value decomposition

From Wikipedia, the free encyclopedia

Jump to navigation Jump to search
Illustration of the singular value decomposition UΣV* of a real 2×2 matrix M.

Top: The action of M, indicated by its effect on the unit disc D and the two canonical unit vectors e1 and e2.
Left: The action of V*, a rotation, on D, e1, and e2.
Bottom: The action of Σ, a scaling by the singular values σ1 horizontally and σ2 vertically.
Right: The action of U, another rotation.

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix that generalizes the eigendecomposition of a square normal matrix to any {\displaystyle m\times n} $m\times n$ matrix via an extension of the polar decomposition.

Specifically, the singular value decomposition of an $m\times n$ real or complex matrix $\mathbf {M}$ is a factorization of the form $\mathbf {U\Sigma V^{*}}$ , where $\mathbf {U}$ is an $m\times m$ real or complex unitary matrix, $\mathbf{\Sigma}$ is an $m\times n$ rectangular diagonal matrix with non-negative real numbers on the diagonal, and $\mathbf {V}$ is an $n\times n$ real or complex unitary matrix. If $\mathbf {M}$ is real, $\mathbf {U}$ and $\mathbf {V} =\mathbf {V^{*}}$ are real orthonormal matrices."

https://en.wikipedia.org/wiki/Singular_value_decomposition

PCA using Python (scikit-learn)

https://towardsdatascience.com/pca-using-python-scikit-learn-e653f8989e60

Random R code in relation to PCA

#calculate covariance matrix
cov_mat = cov(normalized_mat)

#Calculation of eigen values using built in eigen function
#no need here to do our own eigen
eig <- eigen(cov_mat)

#verify with prcomp from R (principal components function)
prcomp(pca_data)

eig$vectors

t(eig$vectors)

Some more information on PCA and FA (Factor Analysis)
https://www.cs.rutgers.edu/~elgammal/classes/cs536/lectures/i2ml-chap6.pdf

*** *** ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Optimization, Data Science, Math

January 28, 2020 Sayed

Optimization Problem:

Advances in Missile Guidance, Control, and Estimation

Preview:
https://play.google.com/books/reader?id=A2PMBQAAQBAJ&hl=en_GB&pg=GBS.PR14

https://books.google.ca/books?id=A2PMBQAAQBAJ&pg=PA595&lpg=PA595&dq=force+moment+interaction+with+thrusters&source=bl&ots=BruxnXwLzp&sig=ACfU3U39G-l3xDzbotOBJHcMV5uR7DkciQ&hl=en&sa=X&ved=2ahUKEwjZpsT44afnAhXRJt8KHfPYCroQ6AEwCnoECAoQAQ#v=onepage&q=force%20moment%20interaction%20with%20thrusters&f=false

"What is the difference between affine and linear?
4 Answers. A linear function fixes the origin, whereas an affine function need not do so. An affine function is the composition of a linear function with a translation, so while the linear part fixes the origin, the translation can map it somewhere else.Sep 15, 2014"

"If you choose bases for vector spaces 𝑉 and 𝑊 of dimensions 𝑚 and 𝑛 respectively, and consider functions 𝑓:𝑉→𝑊, then 𝑓 is linear if 𝑓(𝑣)=𝐴𝑣 for some 𝑛×𝑚 matrix 𝐴 and 𝑓 is affine if 𝑓(𝑣)=𝐴𝑣+𝑏 for some matrix 𝐴 and vector 𝑏, where coordinate representations are used with respect to the bases chosen."

https://math.stackexchange.com/questions/275310/what-is-the-difference-between-linear-and-affine-function/275327

*** *** ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Misc. Math for Data Science, Engineering, and/or Optimization

January 28, 2020 Sayed

What is the Inverse of a Matrix?

https://www.mathsisfun.com/algebra/matrix-inverse.html

What is Norm?
"In linear algebra, functional analysis, and related areas of mathematics, a norm is a function that satisfies certain properties pertaining to scalability and additivity, and assigns a strictly positive real number to each vector in a vector space over the field of real or complex numbers—except for the zero vector, which is assigned zero.[1]

A pseudonorm (seminorm), on the other hand, is allowed to assign zero to some non-zero vectors (in addition to the zero vector).[2]

The term "norm" is commonly used to refer to the vector norm in Euclidean space. It is known as the "Euclidean norm" (see below) which is technically called the L2-norm. The Euclidean norm maps a vector to its length in Euclidean space. Because of this, the Euclidean norm is often known as the magnitude."

"A vector space on which a norm is defined is called a normed vector space. Similarly, a vector space with a seminorm is called a semi normed vector space. It is often possible to supply a norm for a given vector space in more than one way."

https://en.wikipedia.org/wiki/Norm_(mathematics)

What is Linear programming?

"Linear programming (LP, also called linear optimization) is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. "

"More formally, linear programming is a technique for the optimization of a linear objective function, subject to linear equality and linear inequality constraints. Its feasible region is a convex polytope, which is a set defined as the intersection of finitely many half spaces, each of which is defined by a linear inequality. Its objective function is a real-valued affine (linear) function defined on this polyhedron. A linear programming algorithm finds a point in the polytope where this function has the smallest (or largest) value if such a point exists.

Linear programs are problems that can be expressed in canonical form as

${\begin{aligned}&{\text{Maximize}}&&\mathbf {c} ^{\mathrm {T} }\mathbf {x} \\&{\text{subject to}}&&A\mathbf {x} \leq \mathbf {b} \\&{\text{and}}&&\mathbf {x} \geq \mathbf {0} \end{aligned}}$ "

https://en.wikipedia.org/wiki/Linear_programming

*** . *** . ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Misc. Math. Might Relate to Optimization

January 26, 2020 Sayed

find the equation for a line

http://www.webmath.com/_answer.php

Parametric forms for lines and vectors

https://www.futurelearn.com/courses/maths-linear-quadratic-relations/0/steps/12128

Solving Systems of Linear Equations Using Matrices

https://www.mathsisfun.com/algebra/systems-linear-equations-matrices.html

Affine Space
"

Subspace
https://www.wolframalpha.com/input/?i=subspace

"What is an affine set?
A set is called “affine” iff for any two points in the set, the line through them is contained in the set. In other words, for any two points in the set, their affine combinations are in the set itself. Theorem 1. A set is affine iff any affine combination of points in the set is in the set itself."
https://www.cse.iitk.ac.in/users/rmittal/prev_course/s14/notes/lec3.pdf [good one to check]

linear/conic/affine/convex combination

https://observablehq.com/@eliaskal/point-combinations-linear-conic-affine-convex

"In linear algebra, the column space (also called the range or image) of a matrix A is the span (set of all possible linear combinations) of its column vectors. The column space of a matrix is the image or range of the corresponding matrix transformation.
en.wikipedia.org › wiki › Row_and_column_spaces

Row and column spaces - Wikipedia

Row and column spaces

https://en.wikipedia.org/wiki/Row_and_column_spaces

"Any linear combination of the column vectors of a matrix A can be written as the product of A with a column vector:"

Infimum and supremum

From Wikipedia, the free encyclopedia

Jump to navigation Jump to search
A set T of real numbers (hollow and filled circles), a subset S of T (filled circles), and the infimum of S. Note that for finite, totally ordered sets the infimum and the minimum are equal.

A set A of real numbers (blue circles), a set of upper bounds of A (red diamond and circles), and the smallest such upper bound, that is, the supremum of A (red diamond).

In mathematics, the infimum (abbreviated inf; plural infima) of a subset S of a partially ordered set T is the greatest element in T that is less than or equal to all elements of S, if such an element exists.[1] Consequently, the term greatest lower bound (abbreviated as GLB) is also commonly used.[1]

The supremum (abbreviated sup; plural suprema) of a subset S of a partially ordered set T is the least element in T that is greater than or equal to all elements of S, if such an element exists.[1] Consequently, the supremum is also referred to as the least upper bound (or LUB).[1]

"
https://en.wikipedia.org/wiki/Infimum_and_supremum

Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Industry Job Prospect for Graph Mining

January 24, 2020 Sayed

Industry Job Prospect for Graph Mining

Sample Jobs

https://www.careerbuilder.com/jobs-graph-mining

https://www.indeed.com/q-Graph-Mining-jobs.html

For example, Google works in the following areas of Graph Mining. Google has jobs for such. Also, Facebook and any other social networking site will have jobs in relation to Graph Mining. Computational Biology, Medicine Research, Drug Discovery, Disease Diagnosis, Transportation, Scheduling, Shipping Scheduling will have applications and jobs for Graph Mining.

Job Areas:

The general Mining (data based) jobs and Machine/Deep/Reinforcement Learning jobs will require Graph Mining expertise sometimes such as positions (real) : Research Intern - Deep Learning for Graphs, ML Engineer - Siri Knowledge Graph

Computer Networks, Network/Cyber Security application development (also R & D) positions might ask for Graph Mining expertise.

Graph Mining will have applications and jobs in Biological, Chemistry, Drug Design areas also in Transportation

Social Network Mining will always involve Graph Mining. Applications: Friend Recommendation

Trajectory Data Mining Jobs at Microsoft

https://www.microsoft.com/en-us/research/publication/trajectory-data-mining-an-overview/

Graph Mining Jobs (areas) at Google:
https://ai.google/research/teams/algorithms-optimization/graph-mining/

"Large-Scale Balanced Partitioning: Example Google Maps Driving Directions, Large-Scale Clustering:clustering graphs at Google scale, Large-Scale Connected Components, Large-Scale Link Modeling: similarity ranking and centrality metrics: link prediction and anomalous link discovery., Large-Scale Similarity Ranking: Personalized PageRank, Egonet similarity, Adamic Adar, and others, Public-private Graph Computation, Streaming and Dynamic Graph Algorithms, ASYMP: Async Message Passing Graph Mining, Large-Scale Centrality Ranking, Large-Scale Graph Building"

More Related Jobs:

Tools in Jobs/Jobs--https://www.researchgate.net/post/Can_you_suggest_a_graph_mining_tool

--https://www.linkedin.com/jobs/gephi-jobs/

--https://bit.ly/2Nwlnrp Data Scientist 2

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Part X: Engineering Optimization: Mathematical Optimization

January 23, 2020 Sayed

Good intro to: Quadratic Forms and Convexity
https://www.dr-eriksen.no/teaching/GRA6035/2010/lecture4.pdf

Concave Upward and Downward

https://www.mathsisfun.com/calculus/concave-up-down-convex.html

Convex functions and K-Convexityhttps://ljk.imag.fr/membres/Anatoli.Iouditski/cours/convex/chapitre_3.pdf

*** . *** . *** . ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Bayesian Statistics and Machine Learning

January 19, 2020 Sayed

Bayesian Statistics and Machine Learning

"Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics."

en.wikipedia.org › wiki › Bayesian_inference

Bayesian inference - Wikipedia

"Firstly, (statistical) inference is the process of deducing properties about a population or probability distribution from data"

"Bayesian inference is therefore just the process of deducing properties about a population or probability distribution from data using Bayes’ theorem. That’s it."
https://towardsdatascience.com/probability-concepts-explained-bayesian-inference-for-parameter-estimation-90e8930e5348

Introduction to Bayesian Inference

https://blogs.oracle.com/datascience/introduction-to-bayesian-inference

Bayesian Linear Regression

In the Bayesian viewpoint, we formulate linear regression using probability distributions rather than point estimates. The response, y, is not estimated as a single value, but is assumed to be drawn from a probability distribution. The model for Bayesian Linear Regression with the response sampled from a normal distribution is:"

https://towardsdatascience.com/introduction-to-bayesian-linear-regression-e66e60791ea7

"Bayesian model selection

Tom Minka

Bayesian model selection uses the rules of probability theory to select among different hypotheses. It is completely analogous to Bayesian classification."

http://alumni.media.mit.edu/~tpminka/statlearn/demo/

Logistic Regression from Bayes' Theorem

https://www.countbayesie.com/blog/2019/6/12/logistic-regression-from-bayes-theorem

Kernel Trick and Kernels
https://svivek.com/teaching/lectures/slides/svm/kernels.pdf

"When talking about kernels in machine learning, most likely the first thing that comes into your mind is the support vector machines (SVM)"

https://medium.com/@zxr.nju/what-is-the-kernel-trick-why-is-it-important-98a98db0961d

Gaussian Processes

"or why I don’t use SVMs"

https://mlss2011.comp.nus.edu.sg/uploads/Site/lect1gp.pdf

Gaussian Process Classification and Active Learning with Multiple Annotators

http://proceedings.mlr.press/v32/rodrigues14.pdf

"Assumed density filtering is an online inference algorithm. that incrementally updates the posterior over W after ob- serving new evidence."

https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/12391/11777

"Expectation propagation (EP) is a technique in Bayesian machine learning. EP finds approximations to a probability distribution. It uses an iterative approach that leverages the factorization structure of the target distribution.

en.wikipedia.org › wiki › Expectation_propagation

Expectation propagation - Wikipedia

rejection sampling

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in $\mathbb {R} ^{m}$ with a density.

https://en.wikipedia.org/wiki/Rejection_sampling

Bayesian Deep learning

"However, graphical networks such as Bayesian networks are useful to model uncertainty along with causal inference and logic deduction. In this regard, Bayesian Deep learning combines perception (deep learning) with strong probabilistic inference which can estimate uncertainty.

www.quora.com › How-different-is-Bayesian-deep-learning-from-deep-...

How different is Bayesian deep learning from deep learning? - Quora

*** . . *** . ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Optimization and Linear Algebra/Math from the Internet

January 18, 2020 Sayed

Optimization and Linear Algebra/Math from the Internet

First order taylor approximation formula?

https://www.thestudentroom.co.uk/showthread.php?t=1247928

Hessian Matrix

https://en.wikipedia.org/wiki/Hessian_matrix

"In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables."

Use in optimization

"Hessian matrices are used in large-scale optimization problems within Newton-type methods because they are the coefficient of the quadratic term of a local Taylor expansion of a function. That is,

"Newton's method in optimization"

In calculus, Newton's method is an iterative method for finding the roots of a differentiable function F, which are solutions to the equation F (x) = 0. In optimization, Newton's method is applied to the derivative f ′ of a twice-differentiable function f to find the roots of the derivative (solutions to f ′(x) = 0), also known as the stationary points of f. These solutions may be minima, maxima, or saddle points.[1]

https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization

SOLVING LINEAR DIFFERENTIAL EQUATIONS WITH THE LAPLACE TRANSFORM

https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118733639.app6

Pointwise supremum of a convex function collection

is it "I think it is either assumed that the 𝑓𝑖 are defined on the same domain 𝐷, or that (following a common convention) we set 𝑓𝑖(𝑥)=+∞ if 𝑥∉Dom(𝑓𝑖). You can easily check that under this convention, the extended 𝑓𝑖 still remain convex and the claim is true."

https://math.stackexchange.com/questions/402919/pointwise-supremum-of-a-convex-function-collection?rq=1

"The supremum of a set is its least upper bound and the infimum is its greatest

upper bound."

https://www.math.ucdavis.edu/~hunter/m125b/ch2.pdf

Sine and Cosine Values

https://math.stackexchange.com/questions/1553990/easy-way-of-memorizing-values-of-sine-cosine-and-tangent/1554126

Barrier Function

Barrier function. In constrained optimization, a field of mathematics, a barrier function is a continuous function whose value on a point increases to infinity as the point approaches the boundary of the feasible region of an optimization problem.

https://en.wikipedia.org/wiki/Barrier_function

Trace: Marix

https://en.wikipedia.org/wiki/Trace_(linear_algebra)

Determinant

"The determinant of a matrix A is denoted det(A), det A, or |A|. Geometrically, it can be viewed as the volume scaling factor of the linear transformation described by the matrix. This is also the signed volume of the n-dimensional parallelepiped spanned by the column or row vectors of the matrix. The determinant is positive or negative according to whether the linear mapping preserves or reverses the orientation of n-space."

Ref: https://en.wikipedia.org/wiki/Determinant

Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

SeDuMi MATLAB add-on: solve optimization problems with linear, quadratic and semidefiniteness constraints

January 18, 2020 Sayed

SeDuMi MATLAB add-on: solve optimization problems with linear, quadratic and semidefiniteness constraints

"Abstract

SeDuMi is an add-on for MATLAB, which lets you solve optimization problems with linear, quadratic and semidefiniteness constraints. It is possible to have complex valued data and variables in SeDuMi. Moreover, large scale optimization problems are solved efficiently, by exploiting sparsity. This paper describes how to work with this toolbox."

https://www.tandfonline.com/doi/abs/10.1080/10556789908805766?journalCode=goms20

Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Lesson 1: The (Linear) Kalman Filter: State Estimation and Localization for Self-Driving Cars

January 17, 2020 Sayed

https://www.coursera.org/lecture/state-estimation-localization-self-driving-cars/lesson-1-the-linear-kalman-filter-7DFmY

https://d3c33hcgiwev3.cloudfront.net/gWbwrisXEem4egrIUlgmqg.processed/full/360p/index.webm?Expires=1579392000&Signature=gLd7RN8aqZhrNLNLl-huuNsIrkWnUp8gPUAMNqk6Xnkx0lmkMKE8XdXs5v7GGSMvq9ieVeR7MAi2bDz6pxUhgWspfMtnZZ2k2ZpKKzKdNoiFHW-zBVcnFTq~yPyC0ssd1gHzenk2SHqPBu1BhkHTqz7nhdXU08UQS-Z1w7qhwcw_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A

*** ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Misc. Optimization. Machine Learning

January 14, 2020 Sayed

"What is machine learning optimization?

Optimization is the most essential ingredient in the recipe of machine learning algorithms. It starts with defining some kind of loss function/cost function and ends with minimizing the it using one or the other optimization routine.Sep 5, 2018"
https://towardsdatascience.com/demystifying-optimizations-for-machine-learning-c6c6405d3eea

Ordered vector space

"Given a vector space V over the real numbers R and a preorder ≤ on the set V, the pair (V, ≤) is called a preordered vector space if for all x, y, z in V and 0 ≤ λ in R the following two axioms are satisfied

x ≤ y implies x + z ≤ y + z
y ≤ x implies λy ≤ λx.

If ≤ is a partial order, (V, ≤) is called an ordered vector space. The two axioms imply that translations and positive homotheties are automorphisms of the order structure and the mapping x ↦ −x is an isomorphism to the dual order structure. Ordered vector spaces are ordered groups under their addition operation."
https://en.wikipedia.org/wiki/Ordered_vector_space

Algebra > Vector Algebra >

Vector Ordering

"If the first nonzero component of the vector difference is , then . If the first nonzero component of is , then ."

http://mathworld.wolfram.com/VectorOrdering.html

Vectors:

https://www.mathsisfun.com/algebra/vectors.html

Vector: Dot Product: Costheta

https://www.mathsisfun.com/algebra/vectors-dot-product.html

Vector Cross Product
https://www.mathsisfun.com/algebra/vectors-cross-product.html

"Optimization lies at the heart of machine learning. Most machine learning problems reduce to optimization problems."
https://www.quora.com/What-is-the-relationship-between-machine-learning-and-mathematical-optimization

"Why is optimization important?
The purpose of optimization is to achieve the “best” design relative to a set of prioritized criteria or constraints. These include maximizing factors such as productivity, strength, reliability, longevity, efficiency, and utilization. ... This decision-making process is known as optimization."
https://digitalcommons.usu.edu/cgi/viewcontent.cgi?article=1031&context=ncete_publications

*** . *** . ***
Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com
FB Group on Learning/Teaching: https://www.facebook.com/banglasalearningschool
Our free or paid events on IT/Data Science/Cloud/Programming/Similar: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

If you want to contribute to the operation of this site (Bangla.SaLearn) including occasional free and/or low cost online/offline training: http://Training.SitesTree.com (or charitable/non-profit work in the education/health/social service sector), you can financially contribute to: safoundation at salearningschool.com using Paypal or Credit Card (on http://sitestree.com/training/enrol/index.php?id=114 ).

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Design optimization

January 14, 2020 Sayed

Design optimization

https://medium.com/generative-design/design-optimization-2ec2ba3b40f7

Learning from nature

https://medium.com/generative-design/learning-from-nature-fe5b7290e3de

*** . **** . ***

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

The World is for Polymaths: An Interview with Sajid Amit, Academic, Researcher, and Development Strategist (Part One)

January 14, 2020 Sayed

The World is for Polymaths: An Interview with Sajid Amit, Academic, Researcher, and Development Strategist (Part One)

0
“When you have multiple lenses with which to consider a problem, it is an incredible advantage. The world is for polymaths,” says Sajid Amit as he makes his case for pursuing an interdisciplinary approach to seeking knowledge. Knowledge is combinatorial in nature. New knowledge emerges at the intersection of distinct verticals. New ideas come into being when two or more distinct ideas interact."
https://futurestartup.com/2019/11/04/polymaths-interview-with-sajid-amit/

Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Math/Stat/CS/DS Topics that you need to know (with Cognitive, Psychomotor, Affective domain skills) to become a true and great Data Scientist

January 7, 2020 Sayed

"The core topics are cross-validation, shrinkage methods (ridge regression, the LASSO, etc.), neural networks, gradient boosting, separating hyperplanes, support vector machines, basis expansion and regularization (e.g., smoothing splines, wavelet smoothing, kernel smoothing), generalized additive models, bump hunting, multivariate adaptive regression splines (MARS), self-organizing maps, mixture model-based clustering, ensemble learning, and p>>n problems. For computing, the R software will be used as well as either Julia or Python."

"Multivariate distributions: Normal, Wishart, T2 and others; regression, correlation, factor analysis, general linear hypothesis."

"Maximum Likelihood Estimation, Cramer-Rao bound, Likelihood Ratio tests, Multi-parameter likelihood methods, Sufficient Statistics, Completeness and MVUE, Exponential Family, Functions of parameters, Uniformly Most Powerful Tests, Generalized likelihood ratio tests, Quadratic forms, Analysis of variance, Introduction to Bayesian Inference"

"general linear model. Applied regression analysis. Incomplete block designs, intra- and inter-block analysis, factorial designs. Random and mixed models. Distribution theory, hypothesis testing, computational techniques."

"Stationary, auto-regressive and moving-average series, Box-Jenkins methods, trend and seasonal effects, tests for white noise, estimation and forecasting methods, introduction to time series in the frequency domain."

"multivariate latent variable models which assume low dimensional latent variable structures for the data. Multivariate statistical methods including Principal Component Analysis (PCA), and Partial Least Squares (PLS) are used for the efficient extraction of information from large databases typically collected by on-line process computers. These models are used for the analysis of process problems, for on-line process monitoring, and for process improvement"

"Searching, optimization, online search agents. Constraint satisfaction. Knowledge, Reasoning and Planning: Logic and Inference, Planning and Acting, Knowledge Representation. Knowledge and Reasoning with Uncertainty. Machine learning problems, training and testing, overfitting. Modelling strategies: data preprocessing, overfitting and model tuning. Measuring predictor importance. Factors that Can Affect Model Performance. Feature selection. Measuring performance of classification models."

From the Contents on: https://academiccalendars.romcmaster.ca/content.php?catoid=39&navoid=8149

****** . ****** . **** . ****

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Part 4: Some Basic Math/Stat Concepts for the wanna be Data Scientists

December 31, 2019 Sayed

Part 4: Some Basic Math/Stat Concepts for the wanna be Data Scientists

Also for the Engineers in General

Quadratic form

"In multivariate statistics, if $\varepsilon$ is a vector of $n$ random variables, and $\Lambda$ is an $n$ -dimensional symmetric matrix, then the scalar quantity $\varepsilon ^{T}\Lambda \varepsilon$ is known as a quadratic form in $\varepsilon$ .
"

Ref: https://en.wikipedia.org/wiki/Quadratic_form_(statistics)

Please also check matrix related concepts. We will provide some matrix concepts at one point.

"In mathematics, a quadratic form is a polynomial with terms all of degree two. For example, is a quadratic form in the variables x and y. Wikipedia"

"
$4x^2 + 2xy - 3y^2$ is a quadratic form in the variables x and y. The coefficients usually belong to a fixed field K, such as the real or complex numbers, and we speak of a quadratic form over K."

"Quadratic forms are not to be confused with a quadratic equation which has only one variable and includes terms of degree two or less. A quadratic form is one case of the more general concept of homogeneous polynomials."

Ref: https://en.wikipedia.org/wiki/Quadratic_form

Quartic function

"
This article is about the univariate case. For the bivariate case, see Quartic plane curve.

Graph of a polynomial of degree 4, with 3 critical points and four real roots (crossings of the x axis) (and thus no complex roots). If one or the other of the local minima were above the x axis, or if the local maximum were below it, or if there were no local maximum and one minimum below the x axis, there would only be two real roots (and two complex roots). If all three local extrema were above the x axis, or if there were no local maximum and one minimum above the x axis, there would be no real root (and four complex roots). The same reasoning applies in reverse to polynomial with a negative quartic coefficient.

In algebra, a quartic function is a function of the form

$f(x)=ax^{4}+bx^{3}+cx^{2}+dx+e,$

where a is nonzero, which is defined by a polynomial of degree four, called a quartic polynomial.

Sometimes the term biquadratic is used instead of quartic, but, usually, biquadratic function refers to a quadratic function of a square (or, equivalently, to the function defined by a quartic polynomial without terms of odd degree), having the form

$f(x)=ax^{4}+cx^{2}+e.$

A quartic equation, or equation of the fourth degree, is an equation that equates a quartic polynomial to zero, of the form

$ax^{4}+bx^{3}+cx^{2}+dx+e=0,$

where a ≠ 0.

The derivative of a quartic function is a cubic function.

Ref: https://en.wikipedia.org/wiki/Quartic_function

Quartic plane curve

Bivariate case

A quartic plane curve is a plane algebraic curve of the fourth degree. It can be defined by a bivariate quartic equation:

$Ax^4+By^4+Cx^3y+Dx^2y^2+Exy^3+Fx^3+Gy^3+Hx^2y+Ixy^2+Jx^2+Ky^2+Lxy+Mx+Ny+P=0,$

with at least one of A, B, C, D, E not equal to zero. This equation has 15 constants. However, it can be multiplied by any non-zero constant without changing the curve; thus by the choice of an appropriate constant of multiplication, any one of the coefficients can be set to 1, leaving only 14 constants. Therefore, the space of quartic curves can be identified with the real projective space ${\mathbb {RP}}^{{14}}$ . It also follows, from Cramer's theorem on algebraic curves, that there is exactly one quartic curve that passes through a set of 14 distinct points in general position, since a quartic has 14 degrees of freedom.

A quartic curve can have a maximum of:

Four connected components
Twenty-eight bi-tangents
Three ordinary double points.

Ref: https://en.wikipedia.org/wiki/Quartic_plane_curve

Expected value of Quadratic Forms

Expected Value :

"It can be shown that[1]

$\operatorname {E} \left[\varepsilon ^{T}\Lambda \varepsilon \right]=\operatorname {tr} \left[\Lambda \Sigma \right]+\mu ^{T}\Lambda \mu$

where $\mu$ and $\Sigma$ are the expected value and variance-covariance matrix of $\varepsilon$ , respectively, and tr denotes the trace of a matrix. This result only depends on the existence of $\mu$ and $\Sigma$ ; in particular, normality of $\varepsilon$ is not required.

Note: you might see $\varepsilon$ is replaced with x, and x' is used for transpose(x).

Also,

may be the equation without the second part (sure there will be an explanation)

The equations above hold irrespective of the distribution of x.

Expected value of Quartic form:

Ref: Estimation Books by Yaakov Bar-Shalom, X. Rong Li, Thiagalingam Kirubarajan

Mixture Density

"Mixture distribution. ... In cases where each of the underlying random variables is continuous, the outcome variable will also be continuous and its probability density function is sometimes referred to as a mixture density."

Ref: https://en.wikipedia.org/wiki/Mixture_distribution

Mixture PDF:

"A mixture pdf is a weighted sum of pdfs with the weights summing up to unity"

gaussian mixture pdf consists of weighted sum of gaussian densities

Ref: https://www.slideshare.net/jins0618/clusteringkmeans-expectmaximization-and-gaussian-mixture-model

https://www.mathworks.com/help/stats/gmdistribution.pdf.html

http://digitalcommons.utep.edu/cgi/viewcontent.cgi?article=2110&context=cs_techrep

ML and Mixture Models:

https://www.cs.toronto.edu/~rgrosse/csc321/mixture_models.pdf

https://statweb.stanford.edu/~tibs/stat315a/LECTURES/em.pdf

Definitions: https://www.statisticshowto.datasciencecentral.com/mixture-distribution/

https://www.asc.ohio-state.edu/gan.1/teaching/spring04/Chapter3.pdf

************
Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Part 3: Some Basic Math/Stat Concepts for the wanna be Data Scientists

December 30, 2019 Sayed

Conditional Probability and PDF

"The conditional probability of an event B is the probability that the event will occur given the knowledge that an event A has already occurred.

This probability is written P(B|A), notation for the probability of B given A. "

"In the case where events A and B are independent (where event A has no effect on the probability of event B), the conditional probability of event B given event A is simply the probability of event B, that is P(B).

If events A and B are not independent, then the probability of the intersection of A and B (the probability that both events occur) is defined by

P(A and B) = P(A)P(B|A)." Multiplication rule

Ref: http://www.stat.yale.edu/Courses/1997-98/101/condprob.htm

Truncated Distribution
"In statistics, a truncated distribution is a conditional distribution that results from restricting the domain of some other probability distribution. Truncated distributions arise in practical statistics in cases where the ability to record, or even to know about, occurrences is limited to values which lie above or below a given threshold or within a specified range.

For example, if the dates of birth of children in a school are examined, these would typically be subject to truncation relative to those of all children in the area given that the school accepts only children in a given age range on a specific date. There would be no information about how many children in the locality had dates of birth before or after the school's cutoff dates if only a direct approach to the school were used to obtain information."

Probability density function for the truncated normal distribution for different sets of parameters. In all cases, a = −10 and b = 10. For the black: μ = −8, σ = 2; blue: μ = 0, σ = 2; red: μ = 9, σ = 10; orange: μ = 0, σ = 10.
Support	$x \in (a,b]$
PDF	$\frac{g(x)}{F(b)-F(a)}$
CDF	${\frac {\int _{a}^{x}g(t)dt}{F(b)-F(a)}}={\frac {F(x)-F(a)}{F(b)-F(a)}}$
Mean	$\frac{\int_a^b x g(x) dx}{F(b)-F(a)}$
Median	$F^{-1}\left({\frac {F(a)+F(b)}{2}}\right)$

https://en.wikipedia.org/wiki/Truncated_distribution

Law of Total Probability

Bayes’s theorem

In probability theory and statistics, Bayes’s theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. Wikipedia

Formula
$P(A\mid B)=\frac {P(B\mid A) \cdot P(A)}{P(B)}$

	=	events
	=	probability of A given B is true
	=	probability of B given A is true
	=	the independent probabilities of A and B

Ref: https://en.wikipedia.org/wiki/Bayes'_theorem

Bayes Formula for Random Variables:

http://pwp.gatech.edu/ece-jrom/wp-content/uploads/sites/436/2017/08/16_BayesRVs-su14.pdf

Using the above equation for the bayes rule for discrete random variable

Bayes formula for Continuous Random Variable

Using:

Conditional Expectation : Discrete Case

Conditional Expectation : Continuous Case

Ref: https://www.math.arizona.edu/~tgk/464_07/cond_exp.pdf

Gaussian Random Variables:

The PDF:

Ref: https://www.sciencedirect.com/topics/engineering/gaussian-random-variable

Gaussian Random Vector:

Ref: http://statweb.stanford.edu/~kjross/Lec11_1015.pdf

The text and images are from the Internet. References are provided.

Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)

MSc. in Comp. Sc. (U of Manitoba, Canada)

MSc. in Data Science and Analytics (Ryerson University, Canada)

Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com

Online and Offline Training: http://Training.SitesTree.com

FB Group: https://www.facebook.com/banglasalearningschool

Our free or paid events on IT/Data Science/Cloud/Programming/Similar: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Part 2: Some basic Math/Statistics concepts that Data Scientists (the true ones) will usually know/use

December 29, 2019 Sayed

Part 2: Some basic Math/Statistics concepts that Data Scientists (the true ones) will usually know/use (came across, studied, learned, used)

Covariance and Correlation

"Covariance is a measure of how two variables change together, but its magnitude is unbounded, so it is difficult to interpret. By dividing covariance by the product of the two standard deviations, one can calculate the normalized version of the statistic. This is the correlation coefficient." https://www.investopedia.com/terms/c/correlationcoefficient.asp on Investing and Covariance/Correlation

Covariance and expected value

"Covariance is calculated as expected value or average of the product of the differences of each random variable from their expected values, where E[X] is the expected value for X and E[Y] is the expected value of y."
cov(X, Y) = E[(X - E[X]) . (Y - E[Y])]
cov(X, Y) = sum (x - E[X]) * (y - E[Y]) * 1/n

Sample: covariance: cov(X, Y) = sum (x - E[X]) * (y - E[Y]) * 1/(n - 1)
Ref: https://machinelearningmastery.com/introduction-to-expected-value-variance-and-covariance/

Formula for continuous variables

where is the joint probability density function of and .

Formula for Discrete Variables

[eq1]

Reference: https://www.statlect.com/glossary/covariance-formula

Correlation:

What is Correlation?

"Correlation, in the finance and investment industries, is a statistic that measures the degree to which two securities move in relation to each other. Correlations are used in advanced portfolio management, computed as the correlation coefficient, which has a value that must fall between -1.0 and +1.0." Ref: https://www.investopedia.com/terms/c/correlation.asp

Correlation formula:

Ref: http://www.stat.yale.edu/Courses/1997-98/101/correl.htm

Correlation in Linear Regression:

"The square of the correlation coefficient, r², is a useful value in linear regression. This value represents the fraction of the variation in one variable that may be explained by the other variable. Thus, if a correlation of 0.8 is observed between two variables (say, height and weight, for example), then a linear regression model attempting to explain either variable in terms of the other variable will account for 64% of the variability in the data."

http://www.stat.yale.edu/Courses/1997-98/101/correl.htm

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Multivariable/BS704_Multivariable5.html

Ref: http://ci.columbia.edu/ci/premba_test/c0331/s7/s7_5.html

Independent Events

www.wyzant.com

"In probability, two events are independent if the incidence of one event does not affect the probability of the other event. If the incidence of one event does affect the probability of the other event, then the events are dependent."

Ref: https://brilliant.org/wiki/probability-independent-events/

How do you know if an event is independent?

"To test whether two events A and B are independent, calculate P(A), P(B), and P(A ∩ B), and then check whether P(A ∩ B) equals P(A)P(B). If they are equal, A and B are independent; if not, they are dependent. 1. You throw two fair dice, one green and one red, and observe the numbers uppermost."
Ref: https://www.zweigmedia.com/RealWorld/tutorialsf15e/frames7_5C.html
With Examples: https://www.mathsisfun.com/data/probability-events-independent.html

Joint Distributions and Independence

The joint PMF of X1X1, X2X2, ⋯⋯, XnXn is defined asPX1,X2,...,Xn(x1,x2,...,xn)=P(X1=x1,X2=x2,...,Xn=xn).

For continuous case:
P((X1,X2,⋯,Xn)∈A)=∫⋯∫A⋯∫fX1X2⋯Xn(x1,x2,⋯,xn)dx1dx2⋯dxn.

marginal PDF of XiXi
fX1(x1)=∫∞−∞⋯∫∞−∞fX1X2...Xn(x1,x2,...,xn)dx2⋯dxn.

Ref: https://www.probabilitycourse.com/chapter6/6_1_1_joint_distributions_independence.php

Random Vectors, Random Matrices, and Their Expected Values

http://www.statpower.net/Content/313/Lecture%20Notes/MatrixExpectedValue.pdf

Random Variables and Probability Distributions: https://www.stat.pitt.edu/stoffer/tsa4/intro_prob.pdf

What are moments of a random variable?

"The “moments” of a random variable (or of its distribution) are expected values of powers or related functions of the random variable. The rth moment of X is E(Xr). In particular, the first moment is the mean, µX = E(X). The mean is a measure of the “center” or “location” of a distribution. Ref: http://homepages.gac.edu/~holte/courses/mcs341/fall10/documents/sect3-3a.pdf"

Characteristic function (probability theory)

Jump to navigation Jump to search The characteristic function of a uniform U(–1,1) random variable. This function is real-valued because it corresponds to a random variable that is symmetric around the origin; however characteristic functions may generally be complex-valued.

"In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function."

Ref: https://en.wikipedia.org/wiki/Characteristic_function_(probability_theory)

Functions of random vectors and their distribution

https://www.statlect.com/fundamentals-of-probability/functions-of-random-vectors

Ref: https://books.google.com/books?id=xz9nQ4wdXG4C&pg=PA42&lpg=PA42&dq=the+characteristic+functions+of+a+vector+random+variable+is+shalom+kiruba&source=bl&ots=VqY-t6i-u2&sig=ACfU3U09k0CHqK_9Lowd8MkLoyjo1ela1Q&hl=en&sa=X&ved=2ahUKEwiaw9mQ0dvmAhXBmeAKHaSgDUgQ6AEwCXoECAkQAQ#v=onepage&q=the%20characteristic%20functions%20of%20a%20vector%20random%20variable%20is%20shalom%20kiruba&f=false

Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Part 1: Some Math/Stat Background that (true) Data Scientists will know/use: from the internet

December 28, 2019 Sayed

Chebyshev's inequality

"In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean.

Specifically, no more than 1/k2 of the distribution's values can be more than k standard deviations away from the mean

equivalently, at least 1 − 1/k2 of the distribution's values are within k standard deviations of the mean

In statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined."

Ref: https://en.wikipedia.org/wiki/Chebyshev%27s_inequality

Probabilistic statement[edit]

Let X (integrable) be a random variable with finite expected value μ and finite non-zero variance σ2. Then for any real number k > 0,

$\Pr(|X-\mu |\geq k\sigma )\leq {\frac {1}{k^{2}}}.$

Only the case $k > 1$ is useful. When $k\leq 1$ the right-hand side ${\frac {1}{k^{2}}}\geq 1$ and the inequality is trivial as all probabilities are ≤ 1.

As an example, using $k={\sqrt {2}}$ shows that the probability that values lie outside the interval $(\mu -{\sqrt {2}}\sigma ,\mu +{\sqrt {2}}\sigma )$ does not exceed ${\frac {1}{2}}$ .

Ref: https://en.wikipedia.org/wiki/Chebyshev%27s_inequality

"Markov's inequality

"Markov's inequality (and other similar inequalities) relate probabilities to expectations, and provide (frequently loose but still useful) bounds for the cumulative distribution function of a random variable."

Statement

"If X is a nonnegative random variable and a > 0, then the probability that X is at least a is at most the expectation of X divided by a:[1]

$\operatorname {P} (X\geq a)\leq {\frac {\operatorname {E} (X)}{a}}.$

Let {\displaystyle a={\tilde {a}}\cdot \operatorname {E} (X)} $a={\tilde {a}}\cdot \operatorname {E} (X)$ ${\tilde {a}}>0$ ); then we can rewrite the previous inequality as

Ref: https://en.wikipedia.org/wiki/Markov%27s_inequality

Check Null Hypothesis concept as well as Chi Square Test here: http://bangla.salearningschool.com/recent-posts/important-basic-concepts-statistics-for-big-data/

Chi-Square Statistic:

"A chi square (χ2) statistic is a test that measures how expectations compare to actual observed data (or model results)."

https://www.investopedia.com/terms/c/chi-square-statistic.asp

"What does chi square test tell you?

The Chi-square test is intended to test how likely it is that an observed distribution is due to chance. It is also called a "goodness of fit" statistic, because it measures how well the observed distribution of data fits with the distribution that is expected if the variables are independent."

https://www.ling.upenn.edu/~clight/chisquared.htm

"In probability theory and statistics, the chi-square distribution (also chi-squared or χ2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-square distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals.[2][3][4][5] When it is being distinguished from the more general noncentral chi-square distribution, this distribution is sometimes called the central chi-square distribution.": https://en.wikipedia.org/wiki/Chi-squared_distribution

"A chi-squared test, also written as χ2 test, is any statistical hypothesis test where the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Without other qualification, 'chi-squared test' often is used as short for Pearson's chi-squared test. The chi-squared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.": https://en.wikipedia.org/wiki/Chi-squared_test

Statistical Significance Tests for Comparing Machine Learning Algorithms

Learn

Statistical hypothesis tests can aid in comparing machine learning models and choosing a final model.
The naive application of statistical hypothesis tests can lead to misleading results.
Correct use of statistical tests is challenging, and there is some consensus for using the McNemar’s test or 5×2 cross-validation with a modified paired Student t-test.

https://machinelearningmastery.com/statistical-significance-tests-for-comparing-machine-learning-algorithms/

Probability Axioms (I am not convinced that the following is the best way to say)

Axiom 1: The probability of an event is a real number greater than or equal to 0.
Axiom 2: The probability that at least one of all the possible outcomes of a process (such as rolling a die) will occur is 1.
Axiom 3: If two events A and B are mutually exclusive, then the probability of either A or B occurring is the probability of A occurring plus the probability of B occurring.

https://plus.maths.org/content/maths-minute-axioms-probability

1. Probability is non-negative

2. P{S} = 1

3. Probability is additive

If A and B are two mutually exclusive (independent) events

P (A U B) = P(A) + P(B)

P (A intersection B) = empty = 0 . [nothing common]

P{A} = 1 - P'(A)

P{phi = empty} = 0

What does probability density function mean?

"Probability density function (PDF) is a statistical expression that defines a probability distribution for a continuous random variable as opposed to a discrete random variable. When the PDF is graphically portrayed, the area under the curve will indicate the interval in which the variable will fall" https://www.investopedia.com/terms/p/pdf.asp

"A probability density function is most commonly associated with absolutely continuous univariate distributions. A random variable $X$ has density $f_X$ , where $f_X$ is a non-negative Lebesgue-integrable function, if:
$\Pr[a\leq X\leq b]=\int _{a}^{b}f_{X}(x)\,dx.$

Hence, if $F_{X}$ is the cumulative distribution function of $X$ , then:

$F_{X}(x)=\int _{-\infty }^{x}f_{X}(u)\,du,$

and $f_X$ is continuous at $x$

$f_{X}(x)={\frac {d}{dx}}F_{X}(x).$

Intuitively, one can think of $f_{X}(x)\,dx$ as being the probability of $X$ falling within the infinitesimal interval $[x,x+dx]$ ."
https://en.wikipedia.org/wiki/Probability_density_function

Probability mass function

Jump to navigation Jump to search
The graph of a probability mass function. All the values of this function must be non-negative and sum up to 1.

"In probability and statistics, a probability mass function (PMF) is a function that gives the probability that a discrete random variable is exactly equal to some value.[1] Sometimes it is also known as the discrete density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

A probability mass function differs from a probability density function (PDF) in that the latter is associated with continuous rather than discrete random variables. A PDF must be integrated over an interval to yield a probability.[2]

The value of the random variable having the largest probability mass is called the mode."https://en.wikipedia.org/wiki/Probability_mass_function

4.3.1 Mixed Random Variables

Here, we will discuss mixed random variables. These are random variables that are neither discrete nor continuous, but are a mixture of both. In particular, a mixed random variable has a continuous part and a discrete part.

https://www.probabilitycourse.com/chapter4/4_3_1_mixed.php . Also check the examples from here

Expected values of a random variable
The expected value of a discrete random variable is the probability-weighted average of all its possible values. In other words, each possible value the random variable can assume is multiplied by its probability of occurring, and the resulting products are summed to produce the expected value.
https://en.wikipedia.org/wiki/Expected_value

The “moments” of a random variable

The “moments” of a random variable (or of its distribution) are expected values of powers or related functions of the random variable. The rth moment of X is E(Xr). In particular, the first moment is the mean, µX = E(X). The mean is a measure of the “center” or “location” of a distribution

http://homepages.gac.edu/~holte/courses/mcs341/fall10/documents/sect3-3a.pdf

Joint distributions

"Joint distributions Notes: Below X and Y are assumed to be continuous random variables. This case is, by far, the most important case. Analogous formulas, with sums replacing integrals and p.m.f.’s instead of p.d.f.’s, hold for the case when X and Y are discrete r.v.’s. Appropriate analogs also hold for mixed cases (e.g., X discrete, Y continuous), and for the more general case of n random variables X1, . . . , Xn.

• Joint cumulative distribution function (joint c.d.f.): F(x, y) = P(X ≤ x, Y ≤ y)"

https://faculty.math.illinois.edu/~hildebr/461/jointdistributions.pdf

The above were mostly from the Internet and as is.

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Overview on optimization concepts: From the Internet

December 10, 2019 Sayed

Optimization Concepts:

Convex sets:
"A convex set is a set of points such that, given any two points A, B in that set, the line AB joining them lies entirely within that set. Intuitively, this means that the set is connected (so that you can pass between any two points without leaving the set) and has no dents in its perimeter.
Convexity/What is a convex set? - Wikibooks, open books for ...
https://en.wikibooks.org › wiki › Convexity › What_is_a_convex_set?"

Convex functions:
"A convex function is a real-valued function defined on an interval with the property that its epigraph (the set of points on or above the graph of the function) is a convex set. Convex minimization is a subfield of optimization that studies the problem of minimizing convex functions over convex sets.
Convex set - Wikipedia
https://en.wikipedia.org › wiki › Convex_set"

Optimization problems:
Interesting simple optimization problems and solutions:
http://tutorial.math.lamar.edu/Classes/CalcI/Optimization.aspx
More Simple Optimization Problems and Solutions:
https://www.khanacademy.org/search?page_search_query=Optimization%20problems%20(calculus)

Basics of convex analysis:
https://en.wikipedia.org/wiki/Convex_analysis
A good overview: http://eceweb.ucsd.edu/~gert/ECE273/CvxOptTutPaper.pdf

least-squares:
"The least squares method is a statistical procedure to find the best fit for a set of data points by minimizing the sum of the offsets or residuals of points from the plotted curve. Least squares regression is used to predict the behavior of dependent variables.Sep 2, 2019
Least Squares Method Definition - Investopedia
https://www.investopedia.com › terms › least-squares-method"

"minimizing the sum of the squares of the residuals made in the results of every single equation."
"The most important application is in data fitting. The best fit in the least-squares sense minimizes the sum of squared residuals (a residual being: the difference between an observed value, and the fitted value provided by a model)."
"Least-squares problems fall into two categories: linear or ordinary least squares and nonlinear least squares, depending on whether or not the residuals are linear in all unknowns. The linear least-squares problem occurs in statistical regression analysis; it has a closed-form solution. The nonlinear problem is usually solved by iterative refinement; at each iteration the system is approximated by a linear one, and thus the core calculation is similar in both cases.

Polynomial least squares describes the variance in a prediction of the dependent variable as a function of the independent variable and the deviations from the fitted curve.

When the observations come from an exponential family and mild conditions are satisfied, least-squares estimates and maximum-likelihood estimates are identical.[1] The method of least squares can also be derived as a method of moments estimator.

The following discussion is mostly presented in terms of linear functions but the use of least squares is valid and practical for more general families of functions. Also, by iteratively applying local quadratic approximation to the likelihood (through the Fisher information), the least-squares method may be used to fit a generalized linear model."
https://en.wikipedia.org/wiki/Least_squares

linear and quadratic programs:
"A linear programming (LP) problem is one in which the objective and all of the constraints are linear functions of the decision variables."
"A quadratic programming (QP) problem has an objective which is a quadratic function of the decision variables, and constraints which are all linear functions of the variables."
"LP problems are usually solved via the Simplex method. "
"An alternative to the Simplex method, called the Interior Point or Newton-Barrier method, was developed by Karmarkar in 1984. Also in the last decade, this method has been dramatically enhanced with advanced linear algebra methods so that it is often competitive with the Simplex method, especially on very large problems."
"Since a QP problem is a special case of a smooth nonlinear problem, it can be solved by a smooth nonlinear optimization method such as the GRG or SQP method. However, a faster and more reliable way to solve a QP problem is to use an extension of the Simplex method or an extension of the Interior Point or Barrier method."
https://www.solver.com/optimization-problem-types-linear-and-quadratic-programming
Quadratic programming
https://optimization.mccormick.northwestern.edu/index.php/Quadratic_programming

semidefinite programming:
"Semidefinite programming - Wikipedia
https://en.wikipedia.org › wiki › Semidefinite_programming
Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function (a user-specified function that the user wants to minimize or maximize) over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.
‎Motivation and definition · ‎Duality theory · ‎Examples · ‎Algorithms"

"Semidefinite Programming
https://web.stanford.edu › ~boyd › papers › sdp
In semidefinite programming we minimize a linear function subject to the constraint that an affine combination of symmetric matrices is positive semidefinite. Such a constraint is nonlinear and nonsmooth, but convex, so positive definite programs are convex optimization problems."

https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-251j-introduction-to-mathematical-programming-fall-2009/readings/MIT6_251JF09_SDP.pdf

minimax:

"Minimax - Wikipedia
https://en.wikipedia.org › wiki › Minimax
Minimax (sometimes MinMax, MM or saddle point) is a decision rule used in artificial intelligence, decision theory, game theory, statistics and philosophy for minimizing the possible loss for a worst case (maximum loss) scenario. When dealing with gains, it is referred to as "maximin"—to maximize the minimum gain.""

duality theory:
"In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. The solution to the dual problem provides a lower bound to the solution of the primal (minimization) problem.
Duality (optimization) - Wikipedia
https://en.wikipedia.org › wiki › Duality_(optimization)"
"Usually the term "dual problem" refers to the Lagrangian dual problem but other dual problems are used – for example, the Wolfe dual problem and the Fenchel dual problem. The Lagrangian dual problem is obtained by forming the Lagrangian of a minimization problem by using nonnegative Lagrange multipliers to add the constraints to the objective function, and then solving for the primal variable values that minimize the original objective function. This solution gives the primal variables as functions of the Lagrange multipliers, which are called dual variables, so that the new problem is to maximize the objective function with respect to the dual variables under the derived constraints on the dual variables (including at least the nonnegativity constraints)."

theorems of alternative:
"Farkas' lemma belongs to a class of statements called "theorems of the alternative": a theorem stating that exactly one of two systems has a solution.
Farkas' lemma - Wikipedia
https://en.wikipedia.org › wiki › Farkas'_lemma"

"In layman's terms, a Theorem of the Alternative is a theorem which states that given two conditions, one of the two conditions is true. It further states that if one of those conditions fails to be true, then the other condition must be true.May 8, 1991"
http://digitalcommons.iwu.edu/cgi/viewcontent.cgi?article=1000&context=math_honproj

theorems of alternative applications; : https://link.springer.com/article/10.1007/BF00939083

interior-point methods:
"Interior-point methods (also referred to as barrier methods or IPMs) are a certain class of algorithms that solve linear and nonlinear convex optimization problems.""
https://en.wikipedia.org/wiki/Interior-point_method

Applications of signal processing:
"Applications of DSP include audio signal processing, audio compression, digital image processing, video compression, speech processing, speech recognition, digital communications, digital synthesizers, radar, sonar, financial signal processing, seismology and biomedicine.
Digital signal processing - Wikipedia
https://en.wikipedia.org › wiki › Digital_signal_processing"

Applications of optimization to signal processing:
"Convex optimization has been used in signal processing for a long time, to choose coefficients for use in fast (linear) algorithms, such as in filter or array design; more recently, it has been used to carry out (nonlinear) processing on the signal itself."
https://web.stanford.edu/~boyd/papers/rt_cvx_sig_proc.html

Kinect Audio Signal Optimization:
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ivantash-optimization_methods_and_their_applications_in_dsp.pdf

statistics and machine learning:
"The Actual Difference Between Statistics and Machine Learning
https://towardsdatascience.com › the-actual-difference-between-statistics-an...
Mar 24, 2019 - “The major difference between machine learning and statistics is their purpose. Machine learning models are designed to make the most accurate predictions possible. Statistical models are designed for inference about the relationships between variables.” ... Statistics is the mathematical study of data."

Machine Learning vs Statistics - KDnuggets
https://www.kdnuggets.com › 2016/11 › machine-learning-vs-statistics
Machine learning is all about predictions, supervised learning, and unsupervised learning, while statistics is about sample, population, and hypotheses. But are ...
https://www.kdnuggets.com/2016/11/machine-learning-vs-statistics.html

The Close Relationship Between Applied Statistics and Machine Learning
https://machinelearningmastery.com/relationship-between-applied-statistics-and-machine-learning/

Control and mechanical engineering:
"Control engineering is the engineering discipline that focuses on the modeling of a diverse range of dynamic systems (e.g. mechanical systems) and the design of controllers that will cause these systems to behave in the desired manner. ... In most cases, control engineers utilize feedback when designing control systems." https://en.wikipedia.org/wiki/Control_engineering

Digital and analog circuit design:
"With the advent of logic synthesis, one of the biggest challenges faced by the electronic design automation (EDA) industry was to find the best netlist representation of the given design description. While two-level logic optimization had long existed in the form of the Quine–McCluskey algorithm, later followed by the Espresso heuristic logic minimizer, the rapidly improving chip densities, and the wide adoption of HDLs for circuit description, formalized the logic optimization domain as it exists today." https://en.wikipedia.org/wiki/Logic_optimization

The analysis and optimization algorithms of the electronic circuits design
https://www.researchgate.net/publication/269211254_The_analysis_and_optimization_algorithms_of_the_electronic_circuits_design

Optimization Methods in Finance
http://web.math.ku.dk/~rolf/CT_FinOpt.pdf

Optimization Models and Methods with Applications in Finance: http://www.bcamath.org/documentos_public/courses/Nogales_2012-13_02_18-22.pdf

Optimization for financial engineering: a special issue: https://link.springer.com/article/10.1007/s11081-017-9358-1

----

Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others to earn a revenue.
http://sitestree.com/training/

If you want to contribute to the operation of this site (Bangla.SaLearn) including occasional free and/or low cost online training (using Zoom.us): http://Training.SitesTree.com (or charitable/non-profit work in the education/health/social service sector), you can financially contribute to: safoundation at salearningschool.com using Paypal or Credit Card (on http://sitestree.com/training/enrol/index.php?id=114 ).

Affiliate Links: Deals on Amazon :
Hottest Deals on Amazon USA: http://tiny.cc/38lddz

Hottest Deals on Amazon CA: http://tiny.cc/bgnddz

Hottest Deals on Amazon Europe: http://tiny.cc/w4nddz

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Misc. Statistics, Engineering, and Sensors

November 16, 2019 Sayed

Learn more about Nonparametric Test
https://www.sciencedirect.com/topics/medicine-and-dentistry/nonparametric-test

Sensor Management for Large-Scale Multisensor-Multitarget Tracking," in Integrated Tracking, Classification, and Sensor Management: Theory and Applications
http://download.e-bookshelf.de/download/0000/7142/31/L-G-0000714231-0002366034.pdf

Approaches to Multisensor Data Fusion in Target Tracking: A Survey
https://www.computer.org/csdl/journal/tk/2006/12/k1696/13rRUxBa56w

Sensor fusion
https://en.wikipedia.org/wiki/Sensor_fusion

Sensor Fusion: Sensor fusion is the process of merging data from multiple sensors such that to reduce the amount of uncertainty that may be involved in a robot navigation motion or task performing.
https://www.sciencedirect.com/topics/engineering/sensor-fusion

Sensor Fusion Tutorials and Applications
http://fusion.isif.org/conferences/fusion2017/Tutorials.html

Sayed Ahmed

Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com

If you want to contribute to the operation of this site including occasional free online training (using Skype, Zoom.us): http://Training.SitesTree.com (or charitable/non-profit work in the education sector), you can financially contribute to: safoundation at salearningschool.com using Paypal. Sometime, we also provide

Affiliate Links:
Hottest Deals on Amazon USA: http://tiny.cc/38lddz

Hottest Deals on Amazon CA: http://tiny.cc/bgnddz

Hottest Deals on Amazon Europe: http://tiny.cc/w4nddz

AI ML DS RL DL NN NLP Data Mining Optimization, Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Statistics: Data Science: Kaiser-Meyer-Olkin (KMO) Test

July 7, 2019 Sayed

Kaiser-Meyer-Olkin (KMO) Test for Sampling Adequacy

https://www.statisticshowto.datasciencecentral.com/kaiser-meyer-olkin/

KMO and Bartlett's Test

https://www.ibm.com/support/knowledgecenter/SSLVMB_23.0.0/spss/tutorials/fac_telco_kmo_01.html

What should be ideal KMO value for factor analysis?

https://www.researchgate.net/post/What_should_be_ideal_KMO_value_for_factor_analysis

Math and Statistics for Data Science, and Engineering, ব্লগ । Blog

Implement Gradient Descend:

January 5, 2019 Sayed

"

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function usinggradient descent, one takes steps proportional to the negative of the gradient (or approximategradient) of the function at the current point.

Gradient descent - Wikipedia

https://en.wikipedia.org/wiki/Gradient_descent

"

Gradient Descend

# From calculation, it is expected that the local minimum occurs at x=9/4

"""

cur_x = 6 # The algorithm starts at x=6

gamma = 0.01 # step size multiplier

precision = 0.00001

previous_step_size = 1

max_iters = 10000 # maximum number of iterations

iters = 0 #iteration counter

df = lambda x: 4 * x**3 - 9 * x**2

while previous_step_size > precision and iters < max_iters:

prev_x = cur_x

cur_x -= gamma * df(prev_x)

previous_step_size = abs(cur_x - prev_x)

iters+=1

print("The local minimum occurs at", cur_x)

#The output for the above will be: ('The local minimum occurs at', 2.2499646074278457)

"""

#----

print('my part')

co_ef = 6

iter = 0

max_iter = 1000

gamma = 0.001

step = 1

precision = 0.0000001

df = lambda x: 4 * x * x * x - 9 * x * x

while (iter <= max_iter) or (step >= precision ) :

prev_co_ef = co_ef

co_ef -= gamma * df (prev_co_ef)

step = abs (prev_co_ef - co_ef)

print(co_ef)

Sayed Ahmed
sayedum

Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://sitestree.com, http://bangla.salearningschool.com

Hermitian Matrix

Definiteness of a matrix

Singular value decomposition

PCA using Python (scikit-learn)

Advances in Missile Guidance, Control, and Estimation

What is the Inverse of a Matrix?

What is Linear programming?

Parametric forms for lines and vectors

Solving Systems of Linear Equations Using Matrices

Row and column spaces

Infimum and supremum

Concave Upward and Downward

Introduction to Bayesian Inference

Bayesian Linear Regression

"Bayesian model selection

Expectation propagation - Wikipedia

"What is machine learning optimization?

Ordered vector space

Vector Ordering

Design optimization

Learning from nature

The World is for Polymaths: An Interview with Sajid Amit, Academic, Researcher, and Development Strategist (Part One)

Quadratic form

Quartic function

Quartic plane curve

What is Correlation?

Correlation in Linear Regression:

Joint Distributions and Independence

Characteristic function (probability theory)

Functions of random vectors and their distribution

Probabilistic statement[edit]

Ref: https://en.wikipedia.org/wiki/Chebyshev%27s_inequality

"Markov's inequality

Statement

Statistical Significance Tests for Comparing Machine Learning Algorithms

Probability mass function

4.3.1 Mixed Random Variables

Kaiser-Meyer-Olkin (KMO) Test for Sampling Adequacy

KMO and Bartlett's Test

What should be ideal KMO value for factor analysis?

"

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function usinggradient descent, one takes steps proportional to the negative of the gradient (or approximategradient) of the function at the current point.

Gradient descent - Wikipedia

https://en.wikipedia.org/wiki/Gradient_descent "

Gradient Descend

Machine Learning, Big Data, Data Science, Analytics, Cloud, Security, AI, Robotics, Database, BI, Development: Software, Web, Mobile

https://en.wikipedia.org/wiki/Gradient_descent

"