"Introduction and Practice of Python Data Mining", fully release Python data analysis capabilities, master core technologies, easily get started with data mining technology and apply it to practical projects.

" Python Data Mining Introduction and Practice" , fully release Python 's data analysis capabilities, master the core technologies of the big data era, easily get started with data mining technology and apply it to practical projects
.

To buy original foreign language books and reference books, please visit the website of foreign language original books and reference books (bookstore). To study abroad, take foreign language exams, and purchase foreign language teaching materials, please come to the Foreign Language Original Books and Reference Books website (bookstore), and search for "foreign language original books and reference books" on Baidu or Google. You can also find the following two-dimensional Enter the code into the WeChat store or APP to purchase.

When you master a foreign language, you are exposed to a new culture and have a new mindset.

" Python Data Mining Introduction and Practice" , introduces the basic knowledge, basic tools and practical methods of data mining, and takes you on the journey of data mining easily by explaining the algorithm step by step. Using a combination of theory and practice, this book presents how to use decision trees and * forest algorithms to predict the results of NBA games, how to use affinity analysis to recommend movies, and how to use Naive Bayes for social media mining. ,and many more. The book also covers neural networks, deep learning, big data processing, and more. This book is for programmers willing to learn and experiment with data mining.

Basic Information

Author :
[ Australia ] Robert Layton ; translated by Du Chunxiao

Publisher : People's Posts and Telecommunications Press

ISBN:
9787115427106

Publication time : 2016-07

Version : 1

Printing time : 2016-11

Impressions : 2

Binding : paperback

Format : 16

Paper : offset paper

Number of pages : 236 pages

Word count : 372 thousand words

Introduction

As an introduction to data mining, this book introduces the basic knowledge, basic tools and practical methods of data mining, and takes you on a journey of data mining by explaining algorithms step by step. Using a combination of theory and practice, this book presents how to use decision trees and * forest algorithms to predict the results of NBA games, how to use affinity analysis to recommend movies, and how to use Naive Bayes for social media mining. ,and many more. The book also covers neural networks, deep learning, big data processing, and more. This book is for programmers willing to learn and experiment with data mining.

Editor's Choice

In the era of big data with rapidly expanding data scale, data mining, the core technology for identifying important data, is playing an increasingly important role. It will give you "superpowers" to solve practical problems: predict the outcome of sports events, place ads, solve author attribution based on the style of the work, and more. This book uses the Python language that is easy to learn and has rich third-party libraries and a good community atmosphere. From the simple to the deep, it takes real data as the research object, and introduces the implementation method of Python data mining to the readers. Through this book, readers will enter the palace of data mining, thoroughly understand the basic knowledge of data mining, and master the outstanding practice of solving practical problems of data mining!

about the author

Robert
Layton , PhD in Computer Science, expert in cybercrime issues and text analysis. He has been keen on Python programming for many years, participated in the development of many open source libraries such as the scikit-learn library, and served as the mentor of the 2014 " Google Summer of Programming " project. He has worked closely with several global data mining companies to mine real data and develop related applications. His company, dataPipeline , provides data mining and data analysis solutions to multiple industries.

Table of contents

Chapter 1 Begin
　Your Data Mining Journey　　1

1.1 　Introduction to Data Mining　　1

1.2 　Using Python and IPython
Notebook 　　2

1.2.1 　Install Python 　　2

1.2.2 　Install IPython 　　4

1.2.3 　Install scikit-learn library　　5

1.3 　Affinity Analysis Example　　5

1.3.1 　What is Affinity Analysis　　5

1.3.2 　Product Recommendation　　6

1.3.3 Loading datasets　　　in NumPy 6

1.3.4 　Implementing Simple Sorting Rules　　8

1.3.5 　Sorting to find out the rules　　10

1.4 　A Simple Example of a Classification Problem　　12

1.5 　What are classifications　　12

1.5.1 　Preparing the dataset　　13

1.5.2 　Implementing the OneR algorithm　　14

1.5.3 　Testing Algorithms　　16

1.6 　Summary　　18

Chapter 2 Classifying
　with scikit-learn Estimators　　19

2.1 scikit-learn estimators　　19　

2.1.1 　Nearest neighbor algorithm　　20

2.1.2 　Distance metrics　　20

2.1.3 　Loading the dataset　　22

2.1.4 　Efforts to standardize processes　　24

2.1.5 　Running the Algorithm　　24

2.1.6 　Setting parameters　　25

2.2 　The application of pipeline in preprocessing　　27

2.2.1 　Preprocessing example　　28

2.2.2 　Standard Preprocessing　　28

2.2.3 　Assembling　　29

2.3 　Pipeline　　29

2.4 　Summary　　30

Chapter 3 Using
　Decision Trees to Predict Winning Teams　　31

3.1 　Loading the dataset　　31

3.1.1 　Collecting data　　31

3.1.2 Loading datasets　　　with pandas 32

3.1.3 　Dataset Cleaning　　33

3.1.4 　Extracting new features　　34

3.2 　Decision Trees　　35

3.2.1 　Parameters in decision trees　　36

3.2.2 　Using decision trees　　37

3.3 Prediction of　　NBA game results 37　

3.4 　Random Forest　　41

3.4.1 　How effective is the integration of decision trees　　42

3.4.2 　Parameters of the Random Forest Algorithm　　42

3.4.3 　Using the Random Forest Algorithm　　43

3.4.4 　Creating new features　　44

3.5 　Summary　　45

Chapter 4 Recommending Movies
　Using Affinity Analysis　　46

4.1 　亲和性分析　　46

4.1.1 　亲和性分析算法　　47

4.1.2 　选择参数　　47

4.2 　电影推荐问题　　48

4.2.1 　获取数据集　　48

4.2.2 　用pandas加载数据　　49

4.2.3 　稀疏数据格式　　49

4.3 　Apriori算法的实现　　50

4.3.1 　Apriori算法　　51

4.3.2 　实现　　52

4.4 　抽取关联规则　　54

4.5 　小结　　60

第5章
　用转换器抽取特征　　62

5.1 　特征抽取　　62

5.1.1 　在模型中表示事实　　62

5.1.2 　通用的特征创建模式　　64

5.1.3 　创建好的特征　　66

5.2 　特征选择　　67

5.3 　创建特征　　71

5.4 　创建自己的转换器　　75

5.4.1 　转换器API　　76

5.4.2 　实现细节　　76

5.4.3 　单元测试　　77

5.4.4 　组装起来　　79

5.5 　小结　　79

第6章
　使用朴素贝叶斯进行社会媒体挖掘　　80

6.1 　消歧　　80

6.1.1 　从社交网站下载数据　　81

6.1.2 　加载数据集并对其分类　　83

6.1.3 　Twitter数据集重建　　87

6.2 　文本转换器　　90

6.2.1 　词袋　　91

6.2.2 　N元语法　　92

6.2.3 　其他特征　　93

6.3 　朴素贝叶斯　　93

6.3.1 　贝叶斯定理　　93

6.3.2 　朴素贝叶斯算法　　94

6.3.3 　算法应用示例　　95

6.4 　应用　　96

6.4.1 　抽取特征　　97

6.4.2 　将字典转换为矩阵　　98

6.4.3 　训练朴素贝叶斯分类器　　98

6.4.4 　组装起来　　98

6.4.5 　用F1值评估　　99

6.4.6 　从模型中获取更多有用的特征　　100

6.5 　小结　　102

第7章
　用图挖掘找到感兴趣的人　　104

7.1 　加载数据集　　104

7.1.1 　用现有模型进行分类　　106

7.1.2 　获取Twitter好友信息　　107

7.1.3 　构建网络　　110

7.1.4 　创建图　　112

7.1.5 　创建用户相似度图　　114

7.2 　寻找子图　　117

7.2.1 　连通分支　　117

7.2.2 　优化参数选取准则　　119

7.3 　小结　　123

第8章
　用神经网络破解验证码　　124

8.1 　人工神经网络　　124

8.2 　创建数据集　　127

8.2.1 　绘制验证码　　127

8.2.2 　将图像切分为单个的字母　　129

8.2.3 　创建训练集　　130

8.2.4 　根据抽取方法调整训练数据集　　131

8.3 　训练和分类　　132

8.3.1 　反向传播算法　　134

8.3.2 　预测单词　　135

8.4 　用词典提升正确率　　138

8.4.1 　寻找相似的单词　　138

8.4.2 　组装起来　　139

8.5 　小结　　140

第9章
　作者归属问题　　142

9.1 　为作品找作者　　142

9.1.1 　相关应用和使用场景　　143

9.1.2 　作者归属　　143

9.1.3 　获取数据　　144

9.2 　功能词　　147

9.2.1 　统计功能词　　148

9.2.2 　用功能词进行分类　　149

9.3 　支持向量机　　150

9.3.1 　用SVM分类　　151

9.3.2 　内核　　151

9.4 　字符N元语法　　152

9.5 　使用安然公司数据集　　153

9.5.1 　获取安然数据集　　153

9.5.2 　创建数据集加载工具　　154

9.5.3 　组装起来　　158

9.5.4 　评估　　158

9.6 　小结　　160

第10章
　新闻语料分类　　161

10.1 　获取新闻文章　　161

10.1.1 　使用Web API获取数据　　162

10.1.2 　数据资源宝库reddit　　164

10.1.3 　获取数据　　165

10.2 　从任意网站抽取文本　　167

10.2.1 　寻找任意网站网页中的主要内容　　167

10.2.2 　组装起来　　168

10.3 　新闻语料聚类　　170

10.3.1 　k-means算法　　171

10.3.2 　评估结果　　173

10.3.3 　从簇中抽取主题信息　　175

10.3.4 　用聚类算法做转换器　　175

10.4 　聚类融合　　176

10.4.1 　证据累积　　176

10.4.2 　工作原理　　179

10.4.3 　实现　　180

10.5 　线上学习　　181

10.5.1 　线上学习简介　　181

10.5.2 　实现　　182

10.6 　小结　　184

第11章
　用深度学习方法为图像中的物体进行分类　　185

11.1 　物体分类　　185

11.2 　应用场景和目标　　185

11.3 　深度神经网络　　189

11.3.1 　直观感受　　189

11.3.2 　实现　　189

11.3.3 　Theano简介　　190

11.3.4 　Lasagne简介　　191

11.3.5 　用nolearn实现神经网络　　194

11.4 　GPU优化　　197

11.4.1 　什么时候使用GPU进行

计算　　198

11.4.2 　用GPU运行代码　　198

11.5 　环境搭建　　199

11.6 　应用　　201

11.6.1 　获取数据　　201

11.6.2 　创建神经网络　　202

11.6.3 　组装起来　　204

11.7 　小结　　205

第12章
　大数据处理　　206

12.1 　大数据　　206

12.2 　大数据应用场景和目标　　207

12.3 　MapReduce　　208

12.3.1 　直观理解　　209

12.3.2 　单词统计示例　　210

12.3.3 　Hadoop MapReduce　　212

12.4 　应用　　212

12.4.1 　获取数据　　213

12.4.2 　朴素贝叶斯预测　　215

12.5 　小结　　226

附录　接下来的方向　　227