The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This
Rating: (out of 38 reviews)
List Price: $ 89.95
Price: $ 66.96
Review by frank lindemann for The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
Rating:
I use data mining tools in my financial engineering and financial modeling work and I have found this book to be very useful. This book provides two crucial types of information. First, it provides enough theory to allow a potential user to understand the essential insights that motivate specific techniques and to evaluate the situations in which those technique are appropriate. Second, the book gives the exact algorithms to implement the various techniques.
While no book I have seen covers every data mining methodology available, this one has the strongest coverage I have seen in additive models, non-linear regression, and CART/MART (regression/classification trees). It also has very strong coverage in many other areas. I highly recommend it.
Review by R. Krause for The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
Rating:
The book is written by some of the biggest names currently in the field, and thus is written at a certain level, this isn’t a fault of the book or the authers, but rather it was written for a specific audience. However I did find it odd when they would occassionally explain basic readily known notation, but later on assume the reader is familiar with what I would regard as advanced notation, or leave out quite a few steps in their mathematics assuming the reader understands what they did. This book covers a wide range of techniques ranging from the more traditional to the current, and for each topic presents an overview of the technique and provides adequate references for further exploration.
The reader should have a good underlying understanding of linear algebra, statistics and probability theory and also be familiar with the techniques presented here. This book was used in a graduate engineering data mining class, and most of us struggled greatly with the book. This book probably would have been more appropriate if this was a book to augment another text, or if this had not been the first time we had seen topics such as those presented, this being the book to explain neural networks, support vector machines and whatnot when you’ve never seen them before makes for a very bewildering experience, but once you find a few journal articles the techniques actually are fairly easy to understand.
The book does not explain how to implement using software any of the techniques, this is a topic left up to other books, such as Modern Applied Statistics with S by Ripley and Venerables, and only in their discussion about apriori for association rules did I see that they state a software package. It would have been nice if they would have given some insight into how they created some of the great graphics that punctuate the book, perhaps as additional material on the website.
A book that is more down to earth for engineers, albeit different in scope, would be Duda and Hart’s Pattern Classification, which I believe are electrical engineers and written more from an engineering standpoint. In addition the Duda and Hard book gives a lot of applications-based problems and has an associated MATLAB handbook to walk readers through building many types of learners, while this book the end-of-chapter excercises are almost exclusively proofs and theoretical excercises. Not a fault of the book, but rather just a difference and depends on what the reader wants to get out of it.
Ultimately, even though it did prove to be a rather confusing book, I have learned a lot from it and will continue to go through it to learn even more from it as it does tend to become more lucid the more I go through it.
Review by for The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
Rating:
The book by Hastie, Tibshirani and Friedman is a welcome
addition to the quickly growing area of machine learning
and data mining. This is a well written book, laid out
nicely with excellent examples by 3 well established
researchers in the field. It will be helpful to those
who are interested in learning about this field, as well
as experts who want to know moreMy only complaint is that although the authors do
make an honest attempt to clearly highlight methods
that are based on their own research,
often this distinction becomes cloudy and the reader
is left with the impression that the methods
advocated are often the best and represent
the standard in the industry. In fact many of
their ideas are only heuristic and it is more than
conceivable that these will eventually be superseeded
with better methods.A good book, which gets you up to speed in the literature
but it will only be relevant for a few years.
Review by Michael R. Chernick for The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
Rating:
Data mining is a field developed by computer scientists but many of its crucial elements are imbedded in important and subtle statistical concepts. Statisticians can play an important role in the development of this field but as was the case with artificial intelligence, expert systems and neural networks the statistical research community has been slow to respond. Hastie, Tibshirani and Friedman are changing this.
Friedman has been a major player in pattern recognition of high dimensional data, in tree classification, regularized discriminant analysis and multivariate adaptive regression splines. He has also done some exciting new research on boosting methods.
Hastie and Tibshirani invented additive models which are very general types of regression models. Tibshirani invented the lasso method and is a leader among the researchers on bootstrap. Hastie invented principal curves and surfaces.
These tools and the expertise of these authors make them naturals to contribute to advances in data mining. They come with great expertise and see data mining from the statistical perspective. They see it as part of a more general process of statistical learning from data.
The book is well written and illustrated with many pretty color graphs and figures. Color adds a dimension in pattern recognition and the authors exploit it in this book. It is really the first of its kind that treats data mining from a statistical perspective and is so comprehensive and up-to-date.
The important statistical tools that are covered in this book include under the category of supervised learning; regression, discriminant analysis, kernel methods, model assessment and selection, bootstrapping, maximum likelihood and Bayesian inference, additive models, classification and regression trees, multivariate adaptive regression splines, boosting, regularization methods, nearest neighbor classification, k means clustering algorithms and neural networks. These methods are illustrated using real problems.
Similarly under the category of unsupervised learning, clustering and association are covered. They cover the latest developments in principal components and principal curves, multidimensional scaling, factor analysis and projection pursuit.
This book is innovative and fresh. It is an important contribution that will become a classic. The level is between intermediate and advanced. Good for an advanced special topics course for graduate students in statistics. A comparable text is the text by Mannila, Hand and Smyth.
This book made effective use of color and maintained a competitive price. This had a major impact on publishers like Wiley that could not sell a book at this size and initial price. Wiley is still looking for a book comparable to this one that they can use to compete with Springer-Verlag. I know this information because I heard from the Wiley acquisitions editor that I worked with on my two books.
Review by Jump for The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
Rating:
This is not an introduction to statistical learning theory. It is a collection of overviews of various statistical methods presented rather than explained to the reader. In order to benefit from this book the reader should have a good background in matrix algebra and should already have a theoretical and working knowledge of the topics covered. For detail on the methods and their real world application the reader should also be prepared to consult other references. Two stars because, fairly or not, it does not have the pedagogical value that I expected of it.