Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark
![Język publikacji: angielski Język publikacji: angielski](https://static01.helion.com.pl/global/flagi/1.png)
- Autor:
- Russell Jurney
![Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark Russell Jurney - okładka ebooka](https://static01.helion.com.pl/global/okladki/326x466/e_0kip.png)
![Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark Russell Jurney - tył okładki ebooka](https://static01.helion.com.pl/global/okladki-tyl/326x466/e_0kip.png)
- Ocena:
- Bądź pierwszym, który oceni tę książkę
- Stron:
- 352
- Dostępne formaty:
-
ePubMobi
Opis ebooka: Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark
Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools.
Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization.
- Build value from your data in a series of agile sprints, using the data-value pyramid
- Extract features for statistical models from a single dataset
- Visualize data with charts, and expose different aspects through interactive reports
- Use historical data to predict the future via classification and regression
- Translate predictions into actions
- Get feedback from users after each sprint to keep your project on track
Wybrane bestsellery
-
Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You...
Big Data for Chimps. A Guide to Massive-Scale Data Processing in Practice Big Data for Chimps. A Guide to Massive-Scale Data Processing in Practice
(121.94 zł najniższa cena z 30 dni)121.44 zł
149.00 zł(-18%) -
Duże zbiory danych dla każdego! W dobie Big Data klasyczne podejście do analizy danych nie przynosi już pożądanych wyników. Skuteczna analiza gigantycznych zbiorów informacji, wyciąganie interesujących wniosków i prezentowanie ich w przejrzystej formie użytkownikowi wymagają...(19.95 zł najniższa cena z 30 dni)
21.95 zł
39.90 zł(-45%) -
Agile i Scrum, Scrum i Agile. Opanowawszy branżę IT, powoli, ale konsekwentnie, zdobywają inne biznesowe przyczółki i rozgaszczają się w firmach na dobre… Albo niedobre, gdy budzą niezrozumienie, protesty, a czasem nawet chęć ucieczki! Agile i Scrum brzmią tak nowocześnie, w teorii świetnie...(23.94 zł najniższa cena z 30 dni)
27.93 zł
39.90 zł(-30%) -
Książka adresowana jest przede wszystkim do kierowników zespołów, umożliwia dogłębne zrozumienie reguł rządzących pracą zespołu. Poruszono w niej takie tematy, jak: teoria systemów złożonych, teoria gier, samoorganizacja i zasada ciemności. Zebrano i usystematyzowano znane od wielu lat klasyczne ...
Zarządzanie 3.0. Kierowanie zespołami z wykorzystaniem metodyk Agile Zarządzanie 3.0. Kierowanie zespołami z wykorzystaniem metodyk Agile
(47.40 zł najniższa cena z 30 dni)55.30 zł
79.00 zł(-30%) -
O tym, ile problemów sprawia niedbale napisany kod, wie każdy programista. Nie wszyscy jednak wiedzą, jak napisać ten świetny, „czysty” kod i czym właściwie powinien się on charakteryzować. Co więcej – jak odróżnić dobry kod od złego? Odpowiedź na te pytania oraz sposoby tworzen...(47.40 zł najniższa cena z 30 dni)
55.30 zł
79.00 zł(-30%) -
Zarządzanie projektami kusi niejedną osobę, która planuje zmianę kariery zawodowej lub jej dalszy rozwój. Aby podejść do tego tematu profesjonalnie, warto poznać bliżej i przyswoić kilka terminów, takich jak Agile, czyli zwinne metodyki pracy, w szczególności – Scrum. To pojęcie oznacza spr...
Agile w akcji. Kurs video. Scrum jako narzędzie sukcesu projektowego Agile w akcji. Kurs video. Scrum jako narzędzie sukcesu projektowego
(39.59 zł najniższa cena z 30 dni)39.59 zł
99.00 zł(-60%) -
Przewodnik, który trzymasz w ręku, powstał właśnie po to, by zasypać otchłań między działem HR i pozostałymi dywizjami organizacji w procesie transformacji. Kate ma nadzieję, że dzięki książce uda jej się pomóc zarówno osobom przeprowadzającym transformację, jak i działom HR firm i organizacji. W...
Kompetentny Scrum Master. Przewodnik po rozwoju Scrum Masterów i Agile Coachów dla HR, zarządzających oraz samych zainteresowanych Kompetentny Scrum Master. Przewodnik po rozwoju Scrum Masterów i Agile Coachów dla HR, zarządzających oraz samych zainteresowanych
(29.94 zł najniższa cena z 30 dni)34.93 zł
49.90 zł(-30%) -
This book is your go-to guide on how to become a successful TPM and thriving in the fast-paced tech industry. It will help you use your technical skills to drive decisions, manage confidently, and communicate effectively. Then, take all of this and discover the career paths that are open to you!
Technical Program Manager's Handbook. Empowering managers to efficiently manage technical projects and build a successful career path Technical Program Manager's Handbook. Empowering managers to efficiently manage technical projects and build a successful career path
-
This book shows you how Microsoft Orleans can make a developer's life easy when it comes to building interactive distributed applications. You'll cover fundamentals such as the Orleans programming model, run time, and virtual actor concepts and get ready to leverage Orleans to build highly scalab...
Distributed .NET with Microsoft Orleans. Build robust and highly scalable distributed applications without worrying about complex programming patterns Distributed .NET with Microsoft Orleans. Build robust and highly scalable distributed applications without worrying about complex programming patterns
-
This mini book will walk you through the fundamentals, principles, and key concepts of Agile, Agile project management, and Agile Delivery. After reading this book, you will have a thorough understanding of Agile and be able to put Agile into practice at work and in your personal projects.
The Mini Book of Agile. Everything you really need to know about Agile, Agile Project Management and Agile Delivery The Mini Book of Agile. Everything you really need to know about Agile, Agile Project Management and Agile Delivery
Ebooka "Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark" przeczytasz na:
-
czytnikach Inkbook, Kindle, Pocketbook, Onyx Boox i innych
-
systemach Windows, MacOS i innych
-
systemach Windows, Android, iOS, HarmonyOS
-
na dowolnych urządzeniach i aplikacjach obsługujących formaty: PDF, EPub, Mobi
Masz pytania? Zajrzyj do zakładki Pomoc »
Audiobooka "Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark" posłuchasz:
-
w aplikacji Ebookpoint na Android, iOS, HarmonyOs
-
na systemach Windows, MacOS i innych
-
na dowolnych urządzeniach i aplikacjach obsługujących format MP3 (pliki spakowane w ZIP)
Masz pytania? Zajrzyj do zakładki Pomoc »
Kurs Video "Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark" zobaczysz:
-
w aplikacjach Ebookpoint i Videopoint na Android, iOS, HarmonyOs
-
na systemach Windows, MacOS i innych z dostępem do najnowszej wersji Twojej przeglądarki internetowej
Szczegóły ebooka
- ISBN Ebooka:
- 978-14-919-6006-6, 9781491960066
- Data wydania ebooka:
-
2017-06-07
Data wydania ebooka często jest dniem wprowadzenia tytułu do sprzedaży i może nie być równoznaczna z datą wydania książki papierowej. Dodatkowe informacje możesz znaleźć w darmowym fragmencie. Jeśli masz wątpliwości skontaktuj się z nami sklep@ebookpoint.pl.
- Język publikacji:
- angielski
- Rozmiar pliku ePub:
- 9.8MB
- Rozmiar pliku Mobi:
- 9.8MB
Spis treści ebooka
- Preface
- Agile Data Science Mailing List
- Data Syndrome, Product Analytics Consultancy
- Live Training
- Who This Book Is For
- How This Book Is Organized
- Conventions Used in This Book
- Using Code Examples
- OReilly Safari
- How to Contact Us
- I. Setup
- 1. Theory
- Introduction
- Definition
- Methodology as Tweet
- Agile Data Science Manifesto
- Iterate, iterate, iterate
- Ship intermediate output
- Prototype experiments over implementing tasks
- Integrate the tyrannical opinion of data
- Climb up and down the data-value pyramid
- Discover and pursue the critical path to a killer product
- Get meta
- Synthesis
- The Problem with the Waterfall
- Research Versus Application Development
- The Problem with Agile Software
- Eventual Quality: Financing Technical Debt
- The Pull of the Waterfall
- The Data Science Process
- Setting Expectations
- Data Science Team Roles
- Recognizing the Opportunity and the Problem
- Adapting to Change
- Harnessing the power of generalists
- Leveraging agile platforms
- Sharing intermediate results
- Notes on Process
- Code Review and Pair Programming
- Agile Environments: Engineering Productivity
- Collaboration space
- Private space
- Personal space
- Realizing Ideas with Large-Format Printing
- 2. Agile Tools
- Scalability = Simplicity
- Agile Data Science Data Processing
- Local Environment Setup
- System Requirements
- Setting Up Vagrant
- Downloading the Data
- EC2 Environment Setup
- Downloading the Data
- Getting and Running the Code
- Getting the Code
- Running the Code
- Jupyter Notebooks
- Touring the Toolset
- Agile Stack Requirements
- Python 3
- Anaconda and Miniconda
- Jupyter notebooks
- Serializing Events with JSON Lines and Parquet
- JSON for Python
- Collecting Data
- Data Processing with Spark
- Hadoop required
- Processing data with Spark
- Publishing Data with MongoDB
- Booting Mongo
- Pushing data to MongoDB from PySpark
- Searching Data with Elasticsearch
- Elasticsearch and PySpark
- Making PySpark data searchable
- Searching our data
- Elasticsearch and PySpark
- Python and Elasticsearch with pyelasticsearch
- Distributed Streams with Apache Kafka
- Starting up Kafka
- Topics, console producer, and console consumer
- Realtime versus batch computing with Spark
- Kafka in Python with kafka-python
- Processing Streams with PySpark Streaming
- Machine Learning with scikit-learn and Spark MLlib
- Why scikit-learn as well as Spark MLlib?
- Scheduling with Apache Airflow (Incubating)
- Installing Airflow
- Preparing a script for use with Airflow
- Conditionally initializing PySpark
- Parameterizing scripts on the command line
- Creating an Airflow DAG in Python
- Complete scripts for Airflow
- Testing a task in Airflow
- Running a DAG in Airflow
- Backfilling data in Airflow
- The power of Airflow
- Reflecting on Our Workflow
- Lightweight Web Applications
- Python and Flask
- Flask echo microservice
- Python and Mongo with pymongo
- Displaying executives in Flask
- Python and Flask
- Presenting Our Data
- Booting Bootstrap
- Visualizing data with D3.js
- Conclusion
- 3. Data
- Air Travel Data
- Flight On-Time Performance Data
- OpenFlights Database
- Air Travel Data
- Weather Data
- Data Processing in Agile Data Science
- Structured Versus Semistructured Data
- SQL Versus NoSQL
- SQL
- NoSQL and Dataflow Programming
- Spark: SQL + NoSQL
- Schemas in NoSQL
- Data Serialization
- Extracting and Exposing Features in Evolving Schemas
- Conclusion
- II. Climbing the Pyramid
- 4. Collecting and Displaying Records
- Putting It All Together
- Collecting and Serializing Flight Data
- Processing and Publishing Flight Records
- Publishing Flight Records to MongoDB
- Presenting Flight Records in a Browser
- Serving Flights with Flask and pymongo
- Rendering HTML5 with Jinja2
- Agile Checkpoint
- Listing Flights
- Listing Flights with MongoDB
- Paginating Data
- Reinventing the wheel?
- Serving paginated data
- Prototyping back from HTML
- Searching for Flights
- Creating Our Index
- Publishing Flights to Elasticsearch
- Searching Flights on the Web
- Conclusion
- 5. Visualizing Data with Charts and Tables
- Chart Quality: Iteration Is Essential
- Scaling a Database in the Publish/Decorate Model
- First Order Form
- Second Order Form
- Third Order Form
- Choosing a Form
- Exploring Seasonality
- Querying and Presenting Flight Volume
- Iterating on our first chart
- Querying and Presenting Flight Volume
- Extracting Metal (Airplanes [Entities])
- Extracting Tail Numbers
- Data processing: batch or realtime?
- Grouping and sorting data in Spark
- Publishing airplanes with Mongo
- Serving airplanes with Flask
- Ensuring database performance with indexes
- Linking back in to our new entity
- Information architecture
- Extracting Tail Numbers
- Assessing Our Airplanes
- Data Enrichment
- Reverse Engineering a Web Form
- Gathering Tail Numbers
- Automating Form Submission
- Extracting Data from HTML
- Evaluating Enriched Data
- Conclusion
- 6. Exploring Data with Reports
- Extracting Airlines (Entities)
- Defining Airlines as Groups of Airplanes Using PySpark
- Querying Airline Data in Mongo
- Building an Airline Page in Flask
- Linking Back to Our Airline Page
- Creating an All Airlines Home Page
- Extracting Airlines (Entities)
- Curating Ontologies of Semi-structured Data
- Improving Airlines
- Adding Names to Carrier Codes
- Incorporating Wikipedia Content
- Publishing Enriched Airlines to Mongo
- Enriched Airlines on the Web
- Investigating Airplanes (Entities)
- SQL Subqueries Versus Dataflow Programming
- Dataflow Programming Without Subqueries
- Subqueries in Spark SQL
- Creating an Airplanes Home Page
- Adding Search to the Airplanes Page
- Code versus configuration
- Configuring a search widget
- Building an Elasticsearch query programmatically
- Creating a Manufacturers Bar Chart
- Iterating on the Manufacturers Bar Chart
- Entity Resolution: Another Chart Iteration
- Entity resolution in 30 seconds
- Resolving manufacturers in PySpark
- Updating our chart
- Boeing versus Airbus revisited
- Cleanliness: Benefits of entity resolution
- Conclusion
- 7. Making Predictions
- The Role of Predictions
- Predict What?
- Introduction to Predictive Analytics
- Making Predictions
- Features
- Regression
- Classification
- Making Predictions
- Exploring Flight Delays
- Extracting Features with PySpark
- Building a Regression with scikit-learn
- Loading Our Data
- Sampling Our Data
- Vectorizing Our Results
- Preparing Our Training Data
- Vectorizing Our Features
- Sparse Versus Dense Matrices
- Preparing an Experiment
- Training Our Model
- Testing Our Model
- Conclusion
- Building a Classifier with Spark MLlib
- Loading Our Training Data with a Specified Schema
- Addressing Nulls
- Replacing FlightNum with Route
- Bucketizing a Continuous Variable for Classification
- Determining arrival delay buckets
- Iterative visualization with histograms
- Bucket quest conclusion
- Determining arrival delay buckets
- Bucketizing with a DataFrame UDF
- Bucketizing with pyspark.ml.feature.Bucketizer
- Feature Vectorization with pyspark.ml.feature
- Vectorizing categorical columns with Spark ML
- Vectorizing continuous variables and indexes with Spark ML
- Classification with Spark ML
- Test/train split with DataFrames
- Creating and fitting a model
- Evaluating a model
- Conclusion
- Conclusion
- 8. Deploying Predictive Systems
- Deploying a scikit-learn Application as a Web Service
- Saving and Loading scikit-learn Models
- Saving and loading objects using pickle
- Saving and loading models using sklearn.externals.joblib
- Groundwork for Serving Predictions
- Creating Our Flight Delay Regression API
- Filling in the predict_utils API
- Saving and Loading scikit-learn Models
- Testing Our API
- Pulling Our API into Our Product
- Deploying a scikit-learn Application as a Web Service
- Deploying Spark ML Applications in Batch with Airflow
- Gathering Training Data in Production
- Training, Storing, and Loading Spark ML Models
- Creating Prediction Requests in Mongo
- Feeding Mongo recommendation tasks from a Flask API
- A frontend for generating prediction requests
- Making a prediction request
- Fetching Prediction Requests from MongoDB
- Making Predictions in a Batch with Spark ML
- Loading Spark ML models in PySpark
- Making predictions with Spark ML
- Storing Predictions in MongoDB
- Displaying Batch Prediction Results in Our Web Application
- Automating Our Workflow with Apache Airflow (Incubating)
- Setting up Airflow
- Creating a DAG for creating our model
- Creating a DAG for operating our model
- Using Airflow to manage and execute DAGs and tasks
- Linking our Airflow script to the Airflow DAGs directory
- Executing our Airflow setup script
- Querying Airflow from the command line
- Testing tasks in Airflow
- Testing DAGs in Airflow
- Monitoring tasks in the Airflow web interface
- Conclusion
- Deploying Spark ML via Spark Streaming
- Gathering Training Data in Production
- Training, Storing, and Loading Spark ML Models
- Sending Prediction Requests to Kafka
- Setting up Kafka
- Start Zookeeper
- Start the Kafka server
- Create a topic
- Verify our new prediction request topic
- Setting up Kafka
- Feeding Kafka recommendation tasks from a Flask API
- A frontend for generating prediction requests
- Polling requests and LinkedIn InMaps
- A controller for the page
- An API controller for serving prediction responses
- Creating a template with a polling form
- Making a prediction request
- Making Predictions in Spark Streaming
- Testing the Entire System
- Overall system summary
- Rubber meets road
- Paydirt!
- Conclusion
- 9. Improving Predictions
- Fixing Our Prediction Problem
- When to Improve Predictions
- Improving Prediction Performance
- Experimental Adhesion Method: See What Sticks
- Establishing Rigorous Metrics for Experiments
- Defining our classification metrics
- Feature importance
- Implementing a more rigorous experiment
- Comparing experiments to determine improvements
- Inspecting changes in feature importance
- Conclusion
- Time of Day as a Feature
- Incorporating Airplane Data
- Extracting Airplane Features
- Incorporating Airplane Features into Our Classifier Model
- Incorporating Flight Time
- Conclusion
- A. Manual Installation
- Installing Hadoop
- Installing Spark
- Installing MongoDB
- Installing the MongoDB Java Driver
- Installing mongo-hadoop
- Building mongo-hadoop
- Installing pymongo_spark
- Installing Elasticsearch
- Installing Elasticsearch for Hadoop
- Setting Up Our Spark Environment
- Installing Kafka
- Installing scikit-learn
- Installing Zeppelin
- Index
O'Reilly Media - inne książki
-
Keeping up with the Python ecosystem can be daunting. Its developer tooling doesn't provide the out-of-the-box experience native to languages like Rust and Go. When it comes to long-term project maintenance or collaborating with others, every Python project faces the same problem: how to build re...(203.15 zł najniższa cena z 30 dni)
208.19 zł
239.00 zł(-13%) -
Bringing a deep-learning project into production at scale is quite challenging. To successfully scale your project, a foundational understanding of full stack deep learning, including the knowledge that lies at the intersection of hardware, software, data, and algorithms, is required.This book il...(237.15 zł najniższa cena z 30 dni)
250.05 zł
289.00 zł(-13%) -
Frontend developers have to consider many things: browser compatibility, usability, performance, scalability, SEO, and other best practices. But the most fundamental aspect of creating websites is one that often falls short: accessibility. Accessibility is the cornerstone of any website, and if a...(194.65 zł najniższa cena z 30 dni)
207.45 zł
239.00 zł(-13%) -
In this insightful and comprehensive guide, Addy Osmani shares more than a decade of experience working on the Chrome team at Google, uncovering secrets to engineering effectiveness, efficiency, and team success. Engineers and engineering leaders looking to scale their effectiveness and drive tra...(118.15 zł najniższa cena z 30 dni)
121.44 zł
149.00 zł(-18%) -
Data modeling is the single most overlooked feature in Power BI Desktop, yet it's what sets Power BI apart from other tools on the market. This practical book serves as your fast-forward button for data modeling with Power BI, Analysis Services tabular, and SQL databases. It serves as a starting ...(194.65 zł najniższa cena z 30 dni)
207.00 zł
239.00 zł(-13%) -
C# is undeniably one of the most versatile programming languages available to engineers today. With this comprehensive guide, you'll learn just how powerful the combination of C# and .NET can be. Author Ian Griffiths guides you through C# 12.0 and .NET 8 fundamentals and techniques for building c...(228.65 zł najniższa cena z 30 dni)
250.44 zł
289.00 zł(-13%) -
Learn how to get started with Futures Thinking. With this practical guide, Phil Balagtas, founder of the Design Futures Initiative and the global Speculative Futures network, shows you how designers and futurists have made futures work at companies such as Atari, IBM, Apple, Disney, Autodesk, Luf...(152.15 zł najniższa cena z 30 dni)
155.45 zł
189.00 zł(-18%) -
Augmented Analytics isn't just another book on data and analytics; it's a holistic resource for reimagining the way your entire organization interacts with information to become insight-driven.Moving beyond traditional, limited ways of making sense of data, Augmented Analytics provides a dynamic,...(177.65 zł najniższa cena z 30 dni)
181.35 zł
219.00 zł(-17%) -
Learn how to prepare for—and pass—the Kubernetes and Cloud Native Associate (KCNA) certification exam. This practical guide serves as both a study guide and point of entry for practitioners looking to explore and adopt cloud native technologies. Adrián González Sánchez ...
Kubernetes and Cloud Native Associate (KCNA) Study Guide Kubernetes and Cloud Native Associate (KCNA) Study Guide
(169.14 zł najniższa cena z 30 dni)177.65 zł
209.00 zł(-15%) -
Python is an excellent way to get started in programming, and this clear, concise guide walks you through Python a step at a time—beginning with basic programming concepts before moving on to functions, data structures, and object-oriented design. This revised third edition reflects the gro...(148.82 zł najniższa cena z 30 dni)
148.72 zł
179.00 zł(-17%)
Dzieki opcji "Druk na żądanie" do sprzedaży wracają tytuły Grupy Helion, które cieszyły sie dużym zainteresowaniem, a których nakład został wyprzedany.
Dla naszych Czytelników wydrukowaliśmy dodatkową pulę egzemplarzy w technice druku cyfrowego.
Co powinieneś wiedzieć o usłudze "Druk na żądanie":
- usługa obejmuje tylko widoczną poniżej listę tytułów, którą na bieżąco aktualizujemy;
- cena książki może być wyższa od początkowej ceny detalicznej, co jest spowodowane kosztami druku cyfrowego (wyższymi niż koszty tradycyjnego druku offsetowego). Obowiązująca cena jest zawsze podawana na stronie WWW książki;
- zawartość książki wraz z dodatkami (płyta CD, DVD) odpowiada jej pierwotnemu wydaniu i jest w pełni komplementarna;
- usługa nie obejmuje książek w kolorze.
Masz pytanie o konkretny tytuł? Napisz do nas: sklep[at]helion.pl.
Książka, którą chcesz zamówić pochodzi z końcówki nakładu. Oznacza to, że mogą się pojawić drobne defekty (otarcia, rysy, zagięcia).
Co powinieneś wiedzieć o usłudze "Końcówka nakładu":
- usługa obejmuje tylko książki oznaczone tagiem "Końcówka nakładu";
- wady o których mowa powyżej nie podlegają reklamacji;
Masz pytanie o konkretny tytuł? Napisz do nas: sklep[at]helion.pl.
Książka drukowana
![Loader](https://static01.helion.com.pl/ebookpoint/img/ajax-loader.gif)
![ajax-loader](https://static01.helion.com.pl/ebookpoint/img/ajax-loader.gif)
Oceny i opinie klientów: Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark Russell Jurney (0)
Weryfikacja opinii następuję na podstawie historii zamówień na koncie Użytkownika umieszczającego opinię. Użytkownik mógł otrzymać punkty za opublikowanie opinii uprawniające do uzyskania rabatu w ramach Programu Punktowego.