Читать книгу The Big R-Book - Philippe J. S. De Brouwer - Страница 13
Preface
ОглавлениеThe author has written this book based on his experience that spans roughly three decades in insurance, banking, and asset management. During his career, the author worked in IT, structured and managed highly technical investment portfolios (at some point oversaw €C24 billion in thousand investment funds), fulfilled many C-level roles (e.g. was CEO of KBCTFI SA [an asset manager in Poland], was CIO and COO for Eperon SA [a fund manager in Ireland] and sat on boards of investment funds, and was involved in big-data projects in London), and did quantitative analysis in risk departments of banks. This gave the author a unique and in-depth view of many areas ranging form analytics, big-data, databases, business requirements, financial modelling, etc.
In this book, the author presents a structured overview of his knowledge and experience for anyone whoworks with data and invites the reader to understand the bigger picture, and discover new aspects. This book also demystifies hype around machine learning and AI, by helping the reader to understand the models and programthem in R without spending toomuch time on the theory.
This book aims to be a starting point for quants, data scientists, modellers, etc. It aims to be the book that bridges different disciplines so that a specialist in one domain can grab this book, understand how his/her discipline fits in the bigger picture, and get enough material to understand the person who is specialized in a related discipline. Therefore, it could be the ideal book that helps you to make career move to another discipline so that in a few years you are that person who understands the whole data-chain. In short, the author wants to give you a short-cut to the knowledge that he spent 30 years to accumulate.
Another important point is that this book is written by and for practitioners: people that work with data, programming and mathematics for a living in a corporate environment. So, this book would be most interesting for anyone interested in data-science, machine learning, statistical learning and mathematical modelling and whomeverwants to convey technical matters in a clear and concise way to non-specialists.
This also means that this book is not necessarily the best book in any of the disciplines that it spans. In every specialisation there are already good contenders.
More formal introductions to statistics are for example in: Cyganowski, Kloeden, and Ombach (2001) and Andersen et al. (1987). There are also many books about specific stochastic processes and their applications in financial markets: see e.g. Wolfgang and Baschnagel (1999), Malliaris and Brock (1982), and Mikosch (1998). While knowledge of stochastic processes and their importance in asset pricing are important, this covers only a very narrow spot of applications and theory. This book is more general, more gently on theoretical foundations and focusses more on the use of data to answer real-life problems in everyday business environment.
A comprehensive introduction to statistics or econometrics can be found in Peracchi (2001) or Greene (1997). A general and comprehensive introduction in statistics is also in Neter, Wasserman, and Whitmore (1988).
This is not simply a book about programming and/or any related techniques. If you just want to learn programming in R, then Grolemund (2014) will be get you started faster. Our Part II will also get you started in programming, though it assumes a certain familiarity with programming and mainly zooms in on aspects that will be important in the rest of the book.
This book is not a comprehensive books about financialmodelling. Other books do a better job in listing all types of possiblemodels.No book does a better job here than BernardMarr's publication: Marr (2016): “Key Business Analytics, the 60+ business analysis tool every manager needs to know.” This book will list you all words that some managers might use and what it means, without any of the mathematics nor any or the programming behind. I warmly recommend keeping this book next to ours. Whenever someone comes up with a term like “customer churn analytics” for example, you can use Bernard's book to find out what it actually means and then turn to ours to “get your hands dirty” and actually do it.
If you are only interested in statistical learning and modelling, you will find the following booksmore focused: Hastie, Tibshirani, and Friedman (2009) or also James,Witten, Hastie, and Tibshirani (2013) who also uses R.
A more in-depth introduction to AI can be found in Russell and Norvig (2016).
Data science ismore elaborately treated in Baesens (2014) and the recent book by Wickham and Grolemund (2016) that provides an excellent introduction to R and data science in general. This last book is a great add-on to this book as it focussesmore on the data-aspects (but less on the statistical learning part). We also focus more on the practical aspects and real data problems in corporate environment.
A book that comes close to ours in purpose is the book that my friend professor Bart Baetens has compiled “Analytics in a Big Data World, the Essential guide to data science and its applications”: Baesens (2014). If the mathematics, programming, and R itself scare you in this book, then Bart's book is for you. Bart's book covers different methods, but above all, for the reader, it is sufficient to be able to use a spreadsheet to do some basic calculations. Therefore, it will not help you to tackle big data nor programming a neural network yourself, but you will understand very well what it means and how things work.
Another book that might work well if the maths in this one are prohibitive to you is Provost and Fawcett (2013), it will give you some insight in what the statistical learning is and how it works, but will not prepare you to use it on real data.
Summarizing, I suggest you buy next to this book also Marr (2016) and Baesens (2014). This will provide you a complete chain from business and buzzwords (Bernard's book) over understanding what modelling is and what practical issues one will encounter (Bart's book) to implementing this in a corporate setting and solve the practical problems of a data scientist and modeller on sizeable data (this book).
In a nutshell, this book does it all, is gentle on theoretical foundations and aims to be a one-stop shop to show the big picture, learn all those things and actually apply it. It aims to serve as a basis when later picking up more advanced books in certain narrow areas. This book will take you on a journey of working with data in a real company, and hence, it will discuss also practical problems such as people filling in forms or extracting data from a SQL database.
It should be readable for any person that finished (or is finishing) university level education in a quantitative field such as physics, civil engineering, mathematics, econometrics, etc. It should also be readable by the senior manager with a technical background, who tries to understand what his army of quants, data scientists, and developers are up to, while having fun learning R. After reading this book you will be able to talk to all, challenge their work, and make most analysis yourself or be part of a bigger entity and specialize in one of the steps of modelling or data-manipulation.
In some way, this book can also be seen as a celebration of FOSS (Free and Open Source Software). We proudly mention that for this book no commercial software was used at all. The operating systemis Linux, the windows manager Fluxbox (sometimes LXDE or KDE),Kile and vi helped the editing process, Okular displayed the PDF-file, even the database servers and Hadoop/Spark are FOSS …and of course R and LATEX provided the icing on the cake. FOSS makes this world a more inclusive place as it makes technology more attainable in poorer places on this world.
FOSS
Hence, we extend a warm thanks to all people that spend somuch time to contributing to free software.