Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark Russell Jurney

Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark Russell Jurney - okladka książki

Autor:: Russell Jurney
Wydawnictwo:: O'Reilly Media (Z chęcią przeczytam książkę w języku polskim)
Ocena:: Bądź pierwszym, który oceni tę książkę
Stron:: 352
Dostępne formaty:: ePub

Mobi

Ebook

135,15 zł ~~159,00 zł~~ (-15%)

95,40 zł najniższa cena z 30 dni

Dodaj do koszyka Dostępny natychmiast po opłaceniu zakupu lub Kup na prezent Kup 1-kliknięciem

Przenieś na półkę

Do przechowalni

Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools.

Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization.

Build value from your data in a series of agile sprints, using the data-value pyramid
Extract features for statistical models from a single dataset
Visualize data with charts, and expose different aspects through interactive reports
Use historical data to predict the future via classification and regression
Translate predictions into actions
Get feedback from users after each sprint to keep your project on track

Wybrane bestsellery

Promocja

Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You’ll gain a practical, actionable view of big data by working w
- ebook
Big Data for Chimps. A Guide to Massive-Scale Data Processing in Practice

Philip (flip) Kromer, Russell Jurney

(83,40 zł najniższa cena z 30 dni)

118.15 zł ~~139.00 zł (-15%)~~
Promocja

Duże zbiory danych dla każdego! W dobie Big Data klasyczne podejście do analizy danych nie przynosi już pożądanych wyników. Skuteczna analiza gigantycznych zbiorów informacji, wyciąganie interesujących wniosków i prezentowanie ich w przejrzystej formie użytkownikowi wymagają mnóstwa czasu i środków. Zastanawiasz się, jak podejść do tego problemu,
- ebook
- książka
Zwinna analiza danych. Apache Hadoop dla każdego

Russell Jurney

(19,95 zł najniższa cena z 30 dni)

19.95 zł ~~39.90 zł (-50%)~~
Słowa kluczowe: neuronowe modele, Blockchain, metody Agile, licencjonowanie oprogramowania, usługi typu REST, programowanie współbieżne, aktywność kory wzrokowej, paradoks decyzyjny, sieć typu Transformer, modelowanie ciągów, grafowe sieci neuronowe, klasyfikator Adaboost, uczenie maszynowe Jedenasta edycja monografii naukowej KASKBOOK Katedry Arch
- ebook
Uczenie maszynowe i systemy rozproszone

Julian Szymański (red.)

0.00 zł
Promocja

Description DevOps is the methodology revolutionizing modern software delivery, merging development and operations through culture, automation, measurement, and sharing (CAMS) to achieve rapid, high-quality releases. This guide is your complete career launchpad, designed not just to teach you the technical pipeline but to ensure you excel in securi
- ebook
The Comprehensive DevOps Interview Guide

Pradeep Shankar Chintale, Ankur Harendrasinh Mahida, Gopi Desaboyina

(46,15 zł najniższa cena z 30 dni)

89.91 zł ~~99.90 zł (-10%)~~
Promocja

You'll gain expert strategies to become an effective SAFe Release Train Engineer. Whether you're starting out or seeking to lead at scale, this book delivers practical insights to help you master the RTE role and enable successful Agile execution.
- ebook
The Release Train Engineer Handbook. Transform your Agile Release Train (ART) with practical, result-driven approaches

Glenn Smith, Tim Jackson, Gez Smith, Andrew Sales

(96,75 zł najniższa cena z 30 dni)

116.10 zł ~~129.00 zł (-10%)~~
Promocja

Description An MVP is the most innovative way for startups to test ideas rapidly and avoid building products that miss customer expectations. A strong product mindset helps teams stay focused on real problems, move faster, and adapt with clarity. This book provides a systematic journey, beginning with the foundational principles of MVP and lean sta
- ebook
Minimum Viable Product for Startups

Saurabh Gupta

(46,15 zł najniższa cena z 30 dni)

89.91 zł ~~99.90 zł (-10%)~~
Promocja

Description The latest unprecedented advancement of AI is considered a significant evolution of technology. It is perceived that AI can do many tasks faster and better than humans. Hence, the role of a software engineer is rapidly evolving and getting reshaped by AI. In light of this, navigating the rapidly changing landscape of AI is no longer a c
- ebook
Managing teams in the Age of AI

Ankur Agrawal

(46,15 zł najniższa cena z 30 dni)

89.91 zł ~~99.90 zł (-10%)~~
Promocja

Hybrid and remote teams are now the norm. Scrum Masters and leaders face unique challenges with global teams, and this book is the essential toolkit to help them overcome obstacles and thrive in today's distributed work environment.
- ebook
Advanced Distributed Scrum. A concise guide to building and leading remote and hybrid Scrum Teams

Kelley O'Connell, Darren Wilmshurst, Lindy Quick

(67,43 zł najniższa cena z 30 dni)

80.91 zł ~~89.90 zł (-10%)~~
Promocja

Discover the vital role of Agile business analysts in transforming project outcomes. Learn how to integrate Agile principles into business analysis, build effective teams, and drive success through practical guidance and strategies.
- ebook
The Power of the Agile Business Analyst. 30 surprising ways a business analyst can add value to your Agile development team

IT Governance Publishing, Jamie Lynn Cooke

(39,90 zł najniższa cena z 30 dni)

125.10 zł ~~139.00 zł (-10%)~~
Promocja

This book introduces adaptable project management, focusing on agile principles, case studies, and techniques for successful execution. Learn to manage risks, plan strategically, and ensure progress with practical advice and expert insights.
- ebook
Adaptable Project Management. A combination of Agile and Project Management for All (PM4A)

IT Governance Publishing, Colin Bentley

(39,90 zł najniższa cena z 30 dni)

125.10 zł ~~139.00 zł (-10%)~~
Promocja

Description Managing software projects in today's fast-paced technological landscape is crucial for success, demanding a clear understanding of processes, people, and products. Practical Software Project Management serves as your essential guide, transforming complex project lifecycles into a manageable and actionable roadmap. This book systematica
- ebook
Practical Software Project Management

Abhi Basu Thakur

(46,15 zł najniższa cena z 30 dni)

89.91 zł ~~99.90 zł (-10%)~~
Promocja

Sztuka ciągłego doskonalenia Agile W dzisiejszym dynamicznym środowisku biznesowym zwinność nie jest już opcjonalna - jest niezbędna. Agile Kata przedstawia potężne podejście łączące zasady Agile i naukowe myślenie Kata, aby utworzyć silny wzorzec ciągłego doskonalenia i zwinności organizacyjnej. Autor książki, czerpiąc ze swojego wieloletniego doś
- ebook
Agile Kata. Wzorce i praktyki dla transformacyjnej zwinności organizacyjnej

Joe Krebs

(39,90 zł najniższa cena z 30 dni)

67.83 zł ~~79.80 zł (-15%)~~

Ebooka "Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark" przeczytasz na:

czytnikach Inkbook, Kindle, Pocketbook, Onyx Boox i innych
systemach Windows, MacOS i innych

systemach Windows, Android, iOS, HarmonyOS
na dowolnych urządzeniach i aplikacjach obsługujących formaty: PDF, EPub, Mobi

Masz pytania? Zajrzyj do zakładki Pomoc »

Oceny i opinie klientów: Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark Russell Jurney

(0)

Szczegóły książki

ISBN Ebooka:: 978-14-919-6006-6, 9781491960066
Data wydania ebooka :: 2017-06-07 Data wydania ebooka często jest dniem wprowadzenia tytułu do sprzedaży i może nie być równoznaczna z datą wydania książki papierowej. Dodatkowe informacje możesz znaleźć w darmowym fragmencie. Jeśli masz wątpliwości skontaktuj się z nami sklep@ebookpoint.pl.
Język publikacji:: angielski
Rozmiar pliku ePub:: 9.8MB
Rozmiar pliku Mobi:: 9.8MB

Zgłoś erratę

Kategorie

Kliknij, aby zgłosić błędnie przypisaną kategorię »

Informatyka » Programowanie » Agile - Programowanie

Dostępność produktu

Produkt nie został jeszcze oceniony pod kątem ułatwień dostępu lub nie podano żadnych informacji o ułatwieniach dostępu lub są one niewystarczające. Prawdopodobnie Wydawca/Dostawca jeszcze nie umożliwił dokonania walidacji produktu lub nie przekazał odpowiednich informacji na temat jego dostępności.

Spis treści książki

Preface
- Agile Data Science Mailing List
- Data Syndrome, Product Analytics Consultancy
  - Live Training
- Who This Book Is For
- How This Book Is Organized
- Conventions Used in This Book
- Using Code Examples
- OReilly Safari
- How to Contact Us
I. Setup
1. Theory
- Introduction
- Definition
  - Methodology as Tweet
  - Agile Data Science Manifesto
    - Iterate, iterate, iterate
    - Ship intermediate output
    - Prototype experiments over implementing tasks
    - Integrate the tyrannical opinion of data
    - Climb up and down the data-value pyramid
    - Discover and pursue the critical path to a killer product
    - Get meta
    - Synthesis
- The Problem with the Waterfall
  - Research Versus Application Development
- The Problem with Agile Software
  - Eventual Quality: Financing Technical Debt
  - The Pull of the Waterfall
- The Data Science Process
  - Setting Expectations
  - Data Science Team Roles
  - Recognizing the Opportunity and the Problem
  - Adapting to Change
    - Harnessing the power of generalists
    - Leveraging agile platforms
    - Sharing intermediate results
- Notes on Process
  - Code Review and Pair Programming
  - Agile Environments: Engineering Productivity
    - Collaboration space
    - Private space
    - Personal space
  - Realizing Ideas with Large-Format Printing
2. Agile Tools
- Scalability = Simplicity
- Agile Data Science Data Processing
- Local Environment Setup
  - System Requirements
  - Setting Up Vagrant
  - Downloading the Data
- EC2 Environment Setup
  - Downloading the Data
- Getting and Running the Code
  - Getting the Code
  - Running the Code
  - Jupyter Notebooks
- Touring the Toolset
  - Agile Stack Requirements
  - Python 3
    - Anaconda and Miniconda
    - Jupyter notebooks
  - Serializing Events with JSON Lines and Parquet
    - JSON for Python
  - Collecting Data
  - Data Processing with Spark
    - Hadoop required
    - Processing data with Spark
  - Publishing Data with MongoDB
    - Booting Mongo
    - Pushing data to MongoDB from PySpark
  - Searching Data with Elasticsearch
    - Elasticsearch and PySpark
      - Making PySpark data searchable
      - Searching our data
    - Python and Elasticsearch with pyelasticsearch
  - Distributed Streams with Apache Kafka
    - Starting up Kafka
    - Topics, console producer, and console consumer
    - Realtime versus batch computing with Spark
    - Kafka in Python with kafka-python
  - Processing Streams with PySpark Streaming
  - Machine Learning with scikit-learn and Spark MLlib
    - Why scikit-learn as well as Spark MLlib?
  - Scheduling with Apache Airflow (Incubating)
    - Installing Airflow
    - Preparing a script for use with Airflow
      - Conditionally initializing PySpark
      - Parameterizing scripts on the command line
    - Creating an Airflow DAG in Python
    - Complete scripts for Airflow
    - Testing a task in Airflow
    - Running a DAG in Airflow
    - Backfilling data in Airflow
    - The power of Airflow
  - Reflecting on Our Workflow
  - Lightweight Web Applications
    - Python and Flask
      - Flask echo microservice
      - Python and Mongo with pymongo
      - Displaying executives in Flask
  - Presenting Our Data
    - Booting Bootstrap
    - Visualizing data with D3.js
- Conclusion
3. Data
- Air Travel Data
  - Flight On-Time Performance Data
  - OpenFlights Database
- Weather Data
- Data Processing in Agile Data Science
  - Structured Versus Semistructured Data
- SQL Versus NoSQL
  - SQL
  - NoSQL and Dataflow Programming
  - Spark: SQL + NoSQL
  - Schemas in NoSQL
  - Data Serialization
  - Extracting and Exposing Features in Evolving Schemas
- Conclusion
II. Climbing the Pyramid
4. Collecting and Displaying Records
- Putting It All Together
- Collecting and Serializing Flight Data
- Processing and Publishing Flight Records
  - Publishing Flight Records to MongoDB
- Presenting Flight Records in a Browser
  - Serving Flights with Flask and pymongo
  - Rendering HTML5 with Jinja2
- Agile Checkpoint
- Listing Flights
  - Listing Flights with MongoDB
  - Paginating Data
    - Reinventing the wheel?
    - Serving paginated data
    - Prototyping back from HTML
- Searching for Flights
  - Creating Our Index
  - Publishing Flights to Elasticsearch
  - Searching Flights on the Web
- Conclusion
5. Visualizing Data with Charts and Tables
- Chart Quality: Iteration Is Essential
- Scaling a Database in the Publish/Decorate Model
  - First Order Form
  - Second Order Form
  - Third Order Form
  - Choosing a Form
- Exploring Seasonality
  - Querying and Presenting Flight Volume
    - Iterating on our first chart
- Extracting Metal (Airplanes [Entities])
  - Extracting Tail Numbers
    - Data processing: batch or realtime?
    - Grouping and sorting data in Spark
    - Publishing airplanes with Mongo
    - Serving airplanes with Flask
    - Ensuring database performance with indexes
    - Linking back in to our new entity
    - Information architecture
  - Assessing Our Airplanes
- Data Enrichment
  - Reverse Engineering a Web Form
  - Gathering Tail Numbers
  - Automating Form Submission
  - Extracting Data from HTML
  - Evaluating Enriched Data
- Conclusion
6. Exploring Data with Reports
- Extracting Airlines (Entities)
  - Defining Airlines as Groups of Airplanes Using PySpark
  - Querying Airline Data in Mongo
  - Building an Airline Page in Flask
  - Linking Back to Our Airline Page
  - Creating an All Airlines Home Page
- Curating Ontologies of Semi-structured Data
- Improving Airlines
  - Adding Names to Carrier Codes
  - Incorporating Wikipedia Content
  - Publishing Enriched Airlines to Mongo
  - Enriched Airlines on the Web
- Investigating Airplanes (Entities)
  - SQL Subqueries Versus Dataflow Programming
  - Dataflow Programming Without Subqueries
  - Subqueries in Spark SQL
  - Creating an Airplanes Home Page
  - Adding Search to the Airplanes Page
    - Code versus configuration
    - Configuring a search widget
    - Building an Elasticsearch query programmatically
  - Creating a Manufacturers Bar Chart
  - Iterating on the Manufacturers Bar Chart
  - Entity Resolution: Another Chart Iteration
    - Entity resolution in 30 seconds
    - Resolving manufacturers in PySpark
    - Updating our chart
    - Boeing versus Airbus revisited
    - Cleanliness: Benefits of entity resolution
- Conclusion
7. Making Predictions
- The Role of Predictions
- Predict What?
- Introduction to Predictive Analytics
  - Making Predictions
    - Features
    - Regression
    - Classification
- Exploring Flight Delays
- Extracting Features with PySpark
- Building a Regression with scikit-learn
  - Loading Our Data
  - Sampling Our Data
  - Vectorizing Our Results
  - Preparing Our Training Data
  - Vectorizing Our Features
  - Sparse Versus Dense Matrices
  - Preparing an Experiment
  - Training Our Model
  - Testing Our Model
  - Conclusion
- Building a Classifier with Spark MLlib
  - Loading Our Training Data with a Specified Schema
  - Addressing Nulls
  - Replacing FlightNum with Route
  - Bucketizing a Continuous Variable for Classification
    - Determining arrival delay buckets
      - Iterative visualization with histograms
      - Bucket quest conclusion
    - Bucketizing with a DataFrame UDF
    - Bucketizing with pyspark.ml.feature.Bucketizer
  - Feature Vectorization with pyspark.ml.feature
    - Vectorizing categorical columns with Spark ML
    - Vectorizing continuous variables and indexes with Spark ML
  - Classification with Spark ML
    - Test/train split with DataFrames
    - Creating and fitting a model
    - Evaluating a model
    - Conclusion
- Conclusion
8. Deploying Predictive Systems
- Deploying a scikit-learn Application as a Web Service
  - Saving and Loading scikit-learn Models
    - Saving and loading objects using pickle
    - Saving and loading models using sklearn.externals.joblib
  - Groundwork for Serving Predictions
  - Creating Our Flight Delay Regression API
    - Filling in the predict_utils API
  - Testing Our API
  - Pulling Our API into Our Product
- Deploying Spark ML Applications in Batch with Airflow
  - Gathering Training Data in Production
  - Training, Storing, and Loading Spark ML Models
  - Creating Prediction Requests in Mongo
    - Feeding Mongo recommendation tasks from a Flask API
    - A frontend for generating prediction requests
    - Making a prediction request
  - Fetching Prediction Requests from MongoDB
  - Making Predictions in a Batch with Spark ML
    - Loading Spark ML models in PySpark
    - Making predictions with Spark ML
  - Storing Predictions in MongoDB
  - Displaying Batch Prediction Results in Our Web Application
  - Automating Our Workflow with Apache Airflow (Incubating)
    - Setting up Airflow
    - Creating a DAG for creating our model
    - Creating a DAG for operating our model
    - Using Airflow to manage and execute DAGs and tasks
      - Linking our Airflow script to the Airflow DAGs directory
      - Executing our Airflow setup script
      - Querying Airflow from the command line
      - Testing tasks in Airflow
      - Testing DAGs in Airflow
      - Monitoring tasks in the Airflow web interface
  - Conclusion
- Deploying Spark ML via Spark Streaming
  - Gathering Training Data in Production
  - Training, Storing, and Loading Spark ML Models
  - Sending Prediction Requests to Kafka
    - Setting up Kafka
      - Start Zookeeper
      - Start the Kafka server
      - Create a topic
      - Verify our new prediction request topic
    - Feeding Kafka recommendation tasks from a Flask API
    - A frontend for generating prediction requests
      - Polling requests and LinkedIn InMaps
      - A controller for the page
      - An API controller for serving prediction responses
      - Creating a template with a polling form
    - Making a prediction request
  - Making Predictions in Spark Streaming
  - Testing the Entire System
    - Overall system summary
    - Rubber meets road
    - Paydirt!
- Conclusion
9. Improving Predictions
- Fixing Our Prediction Problem
- When to Improve Predictions
- Improving Prediction Performance
  - Experimental Adhesion Method: See What Sticks
  - Establishing Rigorous Metrics for Experiments
    - Defining our classification metrics
    - Feature importance
    - Implementing a more rigorous experiment
    - Comparing experiments to determine improvements
    - Inspecting changes in feature importance
    - Conclusion
  - Time of Day as a Feature
- Incorporating Airplane Data
  - Extracting Airplane Features
  - Incorporating Airplane Features into Our Classifier Model
- Incorporating Flight Time
- Conclusion
A. Manual Installation
- Installing Hadoop
- Installing Spark
- Installing MongoDB
- Installing the MongoDB Java Driver
- Installing mongo-hadoop
  - Building mongo-hadoop
  - Installing pymongo_spark
- Installing Elasticsearch
- Installing Elasticsearch for Hadoop
- Setting Up Our Spark Environment
- Installing Kafka
- Installing scikit-learn
- Installing Zeppelin
Index