• Complain

Bruno Després - Neural Networks and Numerical Analysis

Here you can read online Bruno Després - Neural Networks and Numerical Analysis full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2022, publisher: De Gruyter, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

No cover
  • Book:
    Neural Networks and Numerical Analysis
  • Author:
  • Publisher:
    De Gruyter
  • Genre:
  • Year:
    2022
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

Neural Networks and Numerical Analysis: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Neural Networks and Numerical Analysis" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Bruno Després: author's other books


Who wrote Neural Networks and Numerical Analysis? Find out the surname, the name of the author of the book and a list of all author's works by series.

Neural Networks and Numerical Analysis — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Neural Networks and Numerical Analysis" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
De Gruyter Series in Applied and Numerical Mathematics Edited by Rmi Abgrall - photo 1

De Gruyter Series in Applied and Numerical Mathematics

Edited by

Rmi Abgrall
Jos Antonio Carrillo de la Plata
Jean-Michel Coron
Athanassios S. Fokas
Irene Fonseca

Volume

ISBN 9783110783124

e-ISBN (PDF) 9783110783186

e-ISBN (EPUB) 9783110783261

Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de.

2022 Walter de Gruyter GmbH, Berlin/Boston

Mathematics Subject Classification 2020: 68T07, 65Z05, 68T09, 65L99,

Objective functions, neural networks, and linear algebra

We share a philosophy about linear algebra: we think basis-free, we write basis-free, but when the chips are down we close the office door and compute with matrices like fury.

Irving Kaplansky

The basic problem is formulated in the context of interpolation of functions. Let fobj be an objective function modeling some problem of interest,

(1.1) fobj:RmRn.

Let xi be an interpolation point in Rm . Let iRn denote a noise (small enough in a sense to be specified). Noise is an important notion for real data, because real measurements or real data are never perfect. Set

(1.2) yi=fobj(xi)+iRn.

The pair (xi,yi)RmRn is called an interpolated data with noise iRn . If i=0 , then the interpolation data is noiseless.

The finite collection of interpolation data is the dataset

(1.3) D={(xi,yi),i=1,,N}RmRn.

Starting from a given dataset D , a general question is implementing in a computer an approximation f of the objective function fobj . Some evident difficulties for the design of f are the level of the noise, the curse of dimension (that is, m and/or n take high values), the construction of good datasets (good datasets sample the objective function in meaningful regions), the fact that the implementation must be efficient for large N=#(D) , and the quality of the reconstructed function. Other aspects are developed in the notes.

At the end of the chapter, the structure of basic NNs is defined.

1.1 Notations

Standard linear algebra notations are used. For an arbitrary vector zRp in a space of arbitrary dimension p, we denote its square |z|2=i=1p|zi|20 . We have |z|2=z,z where the scalar product , is defined by

a,b=i=1paibi,a,bRp.

Let us consider two matrices M,NMmn(R) , two vectors a,bRm , and one vector cRn such that b=Mc . The contraction of matrices is denoted with equivalent notations

(1.4) M,N=M:N=i=1mj=1nmijnij=Mt:Nt=Mt,Nt.

The tensorial product of two vectors is denoted ac=actMmn(R) . We have

(1.5) a,b=a,Mc=ac,M=ac:M.

A norm for matrices is defined as

(1.6) W=supy=1,x=1y,Mx=supy=1,x=1yx:M,

which has the property ABAB .

Lemma 1.1.1.

Let AMpq(R) , BMqr(R) , and CMpr(R) . Then we have

AB:C=AB,C=A,CBt=A:CBt.
Proof.

We can rely on the following property. Let M,NMmn(R) by two matrices. With the notation of the trace of a square matrix, which is the sum of its diagonal elements, we have M:N=tr(MtN) . We obtain

AB:C=tr((AB)tC)=tr(BtAtC).

A known result in linear algebra is that matrices under the trace operator commute:

AB:C=tr(AtCBt)=tr(At(CBt))=A:CBt.

Matrices with more than two axes of coefficients are called tensors. They will be introduced in Section on Convolutive Neural Networks.

1.2 The least squares method

A first ingredient is the least squares method. Consider a linear function f

(1.7) f:RmRn,xf(x)=Wx+b,

where the parameters are represented by WMnm(R) , which is called the matrix of weights, and bRn , which is called the bias or the offset. Consider now the function

(1.8) J:Mnm(R)RnR,(W,b)(x,y)D|Wx+by|2.

The space of parameters is Q=Mnm(R)Rn=Mn,m+1(R) . A way to minimize the difference between fobj encoded in the dataset D and the function f is determining the parameters that minimize J. That is, we consider an optimal value (W,b)Q in the sense of least squares:

(1.9) J(W,b)J(W,b)(W,b)Q.
Lemma 1.2.1.

There exists an optimal value

(W,b)=argmin(W,b)QJ(W,b)Q.

If the vectors {(xi1)}i=1N span the whole space Rm+1 , then the optimal value is unique.

Remark 1.2.2.

For many problems in data science, we have that #(D)1 , so the last assumption is reasonable.

Remark 1.2.3.

The minimization problem () can be decoupled into n simpler scalar least square problems (one scalar problem per entry in the inputs). On the contrary, in the proof below the approach is more global and has the advantage to introduce some basic tools that will be reused later.

Proof.

The optimal matrix of weights and bias are constructed explicitly. The proof is split into smaller steps.

  • In the first step, we freeze b=0 and look for the optimal value of W.

  • In the second step, both b and W can take any values.

  • In the third comment, this hypothesis of linear independence of the vectors xi is relaxed.

First step. Consider the function J(W)=12i=1N|Wxiyi|2 , where the factor is just for convenience. Set h()=J(W+Z) , where Z0 is an arbitrary nonzero matrix ZMmn(R) . At a local minimum W , the derivative of h vanishes whatever Z is. Therefore we have

(1.10) limh()h(0)=i=1NWxiyi,Zxi=0,

The scalar product can be rewritten as a matrixmatrix or vectorvector multiplication as

i=1NWxiyi,Zxi=i=1Nxi(Wxiyi),Z=0.

Since Z is arbitrary, we get i=1Nxi(Wxiyi)=0 . More transformations yield

i=1Nxi(Wxiyi)t=i=1NxixitWtxiyit=(i=1Nxixi)Wti=1Nxiyi=0.

This is rewritten after transposition as

(1.11) WM=B,M=i=1Nxixi,B=i=1Nyixi.

As in the last assumption of the theorem, let us assume the linear independence of the vectors xi . It yields that the matrix (i=1Nxixi) is invertible. We get

(1.12) W=BM1Mnm(R).

The function h()=J(W+Z) is a polynomial of degree deg(h)2 with respect to . Therefore

(1.13) h()=h(0)+h(0)+12h(0)2=h(0)+12(i=1N|Zxi|2)2h(0)

because h(0)=0 by the definition of the matrix W. So J(W+Z)J(W) for all possible and Z, and W is indeed a minimum argument of the function J.

Second step. Now b can be nonzero. Let us introduce the extended notation

(1.14) W=(Wb)Mn,m+1(R)

and the extended input

(1.15) xi=xi1.

By construction Wxi=Wxi+b . Therefore we can define the extended function

(1.16) J(W)=12i=1N|Wxiyi|2=12i=1N|Wxi+byi|2.

Under the independence condition of the theorem, the first step yields the formula for the optimal solution

(1.17) W=BM1=(i=1Nyixi)(i=1Nxixi)1Mn,m+1(R).

Third step. So far, the proof assumed the linear independence assumption, which is useful to obtain the invertibility of the matrix M in () and of the matrix M in () has at least one solution.

Consider the linear operator

L:Mmn(R)Mmn(R),WW=WM.

We want to show that Bran(L) , where ran(L) is the range of the operator L .

A fundamental linear algebra property is ran(L)=ker(Lt) , where the orthogonality is defined with respect to the contraction of matrices (the operator Lt is the symmetric or adjoint operator of L ). By Lemma we obtain

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Neural Networks and Numerical Analysis»

Look at similar books to Neural Networks and Numerical Analysis. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Neural Networks and Numerical Analysis»

Discussion, reviews of the book Neural Networks and Numerical Analysis and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.