Blueprints for Text Analytics Using Python
- Ocena:
- Bądź pierwszym, który oceni tę książkę
- Stron:
- 424
- Dostępne formaty:
-
ePubMobi
Opis ebooka: Blueprints for Text Analytics Using Python
Turning text into valuable information is essential for businesses looking to gain a competitive advantage. With recent improvements in natural language processing (NLP), users now have many options for solving complex challenges. But it's not always clear which NLP tools or libraries would work for a business's needs, or which techniques you should use and in what order.
This practical book provides data scientists and developers with blueprints for best practice solutions to common tasks in text analytics and natural language processing. Authors Jens Albrecht, Sidharth Ramachandran, and Christian Winkler provide real-world case studies and detailed code examples in Python to help you get started quickly.
- Extract data from APIs and web pages
- Prepare textual data for statistical analysis and machine learning
- Use machine learning for classification, topic modeling, and summarization
- Explain AI models and classification results
- Explore and visualize semantic similarities with word embeddings
- Identify customer sentiment in product reviews
- Create a knowledge graph based on named entities and their relations
Wybrane bestsellery
-
Dzięki tej książce dowiesz się, jak pozyskiwać, analizować i wizualizować dane, a potem używać ich do rozwiązywania problemów biznesowych. Wystarczy, że znasz podstawy Pythona i matematyki na poziomie liceum, aby zacząć stosować naukę o danych w codziennej pracy. Znajdziesz tu szereg praktycznych...(41.40 zł najniższa cena z 30 dni)
41.40 zł
69.00 zł(-40%) -
Ta książka wyjaśni Ci rolę matematyki w tworzeniu, renderowaniu i zmienianiu wirtualnych środowisk 3D, a ponadto pozwoli odkryć tajemnice najpopularniejszych dzisiaj silników gier. Za sprawą licznych praktycznych ćwiczeń zorientujesz się, co się kryje za rysowaniem linii i kształtów graficznych, ...(53.40 zł najniższa cena z 30 dni)
53.40 zł
89.00 zł(-40%) -
Oto zaktualizowane wydanie popularnego przewodnika, dzięki któremu skorzystasz z ponad dwustu sprawdzonych receptur bazujących na najnowszych wydaniach bibliotek Pythona. Wystarczy, że skopiujesz i dostosujesz kod do swoich potrzeb. Możesz też go uruchamiać i testować za pomocą przykładowego zbio...(53.40 zł najniższa cena z 30 dni)
53.40 zł
89.00 zł(-40%) -
Dzięki tej książce zrozumiesz, że w rekurencji nie kryje się żadna magia. Dowiesz się, na czym polega jej działanie i kiedy warto zastosować algorytm rekursywny, a kiedy lepiej tego nie robić. Poznasz szereg klasycznych i mniej znanych algorytmów rekurencyjnych. Pracę z zawartym tu materiałem uła...(47.40 zł najniższa cena z 30 dni)
47.40 zł
79.00 zł(-40%) -
Fachowcy z branży IT, by dobrze wykonywać swoją pracę, muszą w niej zwykle używać od kilku do kilkunastu narzędzi. Z drugiej strony nikt nie jest w stanie dobrze poznać nawet wycinka wszystkich technologii, języków programowania czy aplikacji, które powstają każdego roku. Czy wśród tego bogactwa,...(99.49 zł najniższa cena z 30 dni)
79.59 zł
199.00 zł(-60%) -
Ta książka jest przeznaczona dla każdego, kto choć trochę zna Pythona i chce nauczyć się uczenia maszynowego. Zagadnienia matematyczne zostały tu zaprezentowane w minimalnym stopniu, za to więcej uwagi poświęcono koncepcjom, na których oparto najważniejsze i najczęściej używane narzędzia oraz tec...(59.40 zł najniższa cena z 30 dni)
59.40 zł
99.00 zł(-40%) -
Python to jeden z najpopularniejszych dynamicznych języków programowania. Nie od dziś znajduje on zastosowanie w różnych dziedzinach informatyki, zwłaszcza jako doskonały język skryptowy. Jeśli korzystasz z niego na co dzień i chcesz szybko wyszukiwać niezbędne informacje lub odświeżyć swoją wied...(8.49 zł najniższa cena z 30 dni)
8.49 zł
17.00 zł(-50%)
Ebooka "Blueprints for Text Analytics Using Python" przeczytasz na:
-
czytnikach Inkbook, Kindle, Pocketbook, Onyx Boox i innych
-
systemach Windows, MacOS i innych
-
systemach Windows, Android, iOS, HarmonyOS
-
na dowolnych urządzeniach i aplikacjach obsługujących formaty: PDF, EPub, Mobi
Masz pytania? Zajrzyj do zakładki Pomoc »
Audiobooka "Blueprints for Text Analytics Using Python" posłuchasz:
-
w aplikacji Ebookpoint na Android, iOS, HarmonyOs
-
na systemach Windows, MacOS i innych
-
na dowolnych urządzeniach i aplikacjach obsługujących format MP3 (pliki spakowane w ZIP)
Masz pytania? Zajrzyj do zakładki Pomoc »
Kurs Video "Blueprints for Text Analytics Using Python" zobaczysz:
-
w aplikacjach Ebookpoint i Videopoint na Android, iOS, HarmonyOs
-
na systemach Windows, MacOS i innych z dostępem do najnowszej wersji Twojej przeglądarki internetowej
Szczegóły ebooka
- ISBN Ebooka:
- 978-14-920-7403-8, 9781492074038
- Data wydania ebooka:
- 2020-12-04 Data wydania ebooka często jest dniem wprowadzenia tytułu do sprzedaży i może nie być równoznaczna z datą wydania książki papierowej. Dodatkowe informacje możesz znaleźć w darmowym fragmencie. Jeśli masz wątpliwości skontaktuj się z nami sklep@ebookpoint.pl.
- Język publikacji:
- angielski
- Rozmiar pliku ePub:
- 17.5MB
- Rozmiar pliku Mobi:
- 43.9MB
Spis treści ebooka
- Preface
- Approach of the Book
- Prerequisites
- Some Important Libraries to Know
- Books to Read
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- 1. Gaining Early Insights from Textual Data
- What Youll Learn and What Well Build
- Exploratory Data Analysis
- Introducing the Dataset
- Blueprint: Getting an Overview of the Data with Pandas
- Calculating Summary Statistics for Columns
- Checking for Missing Data
- Plotting Value Distributions
- Comparing Value Distributions Across Categories
- Visualizing Developments Over Time
- Blueprint: Building a Simple Text Preprocessing Pipeline
- Performing Tokenization with Regular Expressions
- Treating Stop Words
- Processing a Pipeline with One Line of Code
- Blueprints for Word Frequency Analysis
- Blueprint: Counting Words with a Counter
- Blueprint: Creating a Frequency Diagram
- Blueprint: Creating Word Clouds
- Blueprint: Ranking with TF-IDF
- Blueprint: Finding a Keyword-in-Context
- Blueprint: Analyzing N-Grams
- Blueprint: Comparing Frequencies Across Time Intervals and Categories
- Creating Frequency Timelines
- Creating Frequency Heatmaps
- Closing Remarks
- 2. Extracting Textual Insights with APIs
- What Youll Learn and What Well Build
- Application Programming Interfaces
- Blueprint: Extracting Data from an API Using the Requests Module
- Pagination
- Rate Limiting
- Blueprint: Extracting Twitter Data with Tweepy
- Obtaining Credentials
- Installing and Configuring Tweepy
- Extracting Data from the Search API
- Extracting Data from a Users Timeline
- Extracting Data from the Streaming API
- Closing Remarks
- 3. Scraping Websites and Extracting Data
- What Youll Learn and What Well Build
- Scraping and Data Extraction
- Introducing the Reuters News Archive
- URL Generation
- Blueprint: Downloading and Interpreting robots.txt
- Blueprint: Finding URLs from sitemap.xml
- Blueprint: Finding URLs from RSS
- Downloading Data
- Blueprint: Downloading HTML Pages with Python
- Blueprint: Downloading HTML Pages with wget
- Extracting Semistructured Data
- Blueprint: Extracting Data with Regular Expressions
- Blueprint: Using an HTML Parser for Extraction
- Extracting the title/headline
- Extracting the article text
- Extracting image captions
- Extracting the URL
- Extracting list information (authors)
- Semantic and nonsemantic content
- Extracting text of links (section)
- Extracting reading time
- Extracting attributes (ID)
- Extracting attribution
- Extracting timestamp
- Blueprint: Spidering
- Introducing the Use Case
- Error Handling and Production-Quality Software
- Density-Based Text Extraction
- Extracting Reuters Content with Readability
- Summary Density-Based Text Extraction
- All-in-One Approach
- Blueprint: Scraping the Reuters Archive with Scrapy
- Possible Problems with Scraping
- Closing Remarks and Recommendation
- 4. Preparing Textual Data for Statistics and Machine Learning
- What Youll Learn and What Well Build
- A Data Preprocessing Pipeline
- Introducing the Dataset: Reddit Self-Posts
- Loading Data Into Pandas
- Blueprint: Standardizing Attribute Names
- Saving and Loading a DataFrame
- Cleaning Text Data
- Blueprint: Identify Noise with Regular Expressions
- Blueprint: Removing Noise with Regular Expressions
- Blueprint: Character Normalization with textacy
- Blueprint: Pattern-Based Data Masking with textacy
- Tokenization
- Blueprint: Tokenization with Regular Expressions
- Tokenization with NLTK
- Recommendations for Tokenization
- Linguistic Processing with spaCy
- Instantiating a Pipeline
- Processing Text
- Blueprint: Customizing Tokenization
- Blueprint: Working with Stop Words
- Blueprint: Extracting Lemmas Based on Part of Speech
- Blueprint: Extracting Noun Phrases
- Blueprint: Extracting Named Entities
- Feature Extraction on a Large Dataset
- Blueprint: Creating One Function to Get It All
- Blueprint: Using spaCy on a Large Dataset
- Persisting the Result
- A Note on Execution Time
- There Is More
- Language Detection
- Spell-Checking
- Token Normalization
- Closing Remarks and Recommendations
- 5. Feature Engineering and Syntactic Similarity
- What Youll Learn and What Well Build
- A Toy Dataset for Experimentation
- Blueprint: Building Your Own Vectorizer
- Enumerating the Vocabulary
- Vectorizing Documents
- Out-of-vocabulary documents
- The Document-Term Matrix
- Calculating similarities
- The Similarity Matrix
- Bag-of-Words Models
- Blueprint: Using scikit-learns CountVectorizer
- Fitting the vocabulary
- Transforming the documents to vectors
- Blueprint: Using scikit-learns CountVectorizer
- Blueprint: Calculating Similarities
- TF-IDF Models
- Optimized Document Vectors with TfidfTransformer
- Introducing the ABC Dataset
- Blueprint: Reducing Feature Dimensions
- Removing stop words
- Minimum frequency
- Maximum frequency
- Blueprint: Improving Features by Making Them More Specific
- Performing linguistic analysis
- Blueprint: Using Lemmas Instead of Words for Vectorizing Documents
- Blueprint: Limit Word Types
- Blueprint: Remove Most Common Words
- Blueprint: Adding Context via N-Grams
- Options of TfidfVectorizer
- Think very carefully about feature dimensions
- Keep number of dimensions in mind
- Options of TfidfVectorizer
- Syntactic Similarity in the ABC Dataset
- Blueprint: Finding Most Similar Headlines to a Made-up Headline
- Blueprint: Finding the Two Most Similar Documents in a Large Corpus (Much More Difficult)
- Blueprint: Finding Related Words
- Tips for Long-Running Programs like Syntactic Similarity
- Summary and Conclusion
- 6. Text Classification Algorithms
- What Youll Learn and What Well Build
- Introducing the Java Development Tools Bug Dataset
- Blueprint: Building a Text Classification System
- Step 1: Data Preparation
- Step 2: Train-Test Split
- Step 3: Training the Machine Learning Model
- Step 4: Model Evaluation
- Precision and recall
- Class imbalance
- Final Blueprint for Text Classification
- Blueprint: Using Cross-Validation to Estimate Realistic Accuracy Metrics
- Blueprint: Performing Hyperparameter Tuning with Grid Search
- Blueprint Recap and Conclusion
- Closing Remarks
- Further Reading
- 7. How to Explain a Text Classifier
- What Youll Learn and What Well Build
- Blueprint: Determining Classification Confidence Using Prediction Probability
- Blueprint: Measuring Feature Importance of Predictive Models
- Blueprint: Using LIME to Explain the Classification Results
- Blueprint: Using ELI5 to Explain the Classification Results
- Blueprint: Using Anchor to Explain the Classification Results
- Using the Distribution with Masked Words
- Working with Real Words
- Closing Remarks
- 8. Unsupervised Methods: Topic Modeling and Clustering
- What Youll Learn and What Well Build
- Our Dataset: UN General Debates
- Checking Statistics of the Corpus
- Preparations
- Nonnegative Matrix Factorization (NMF)
- Blueprint: Creating a Topic Model Using NMF for Documents
- Blueprint: Creating a Topic Model for Paragraphs Using NMF
- Latent Semantic Analysis/Indexing
- Blueprint: Creating a Topic Model for Paragraphs with SVD
- Latent Dirichlet Allocation
- Blueprint: Creating a Topic Model for Paragraphs with LDA
- Blueprint: Visualizing LDA Results
- Blueprint: Using Word Clouds to Display and Compare Topic Models
- Blueprint: Calculating Topic Distribution of Documents and Time Evolution
- Using Gensim for Topic Modeling
- Blueprint: Preparing Data for Gensim
- Blueprint: Performing Nonnegative Matrix Factorization with Gensim
- Blueprint: Using LDA with Gensim
- Blueprint: Calculating Coherence Scores
- Blueprint: Finding the Optimal Number of Topics
- Blueprint: Creating a Hierarchical Dirichlet Process with Gensim
- Blueprint: Using Clustering to Uncover the Structure of Text Data
- Further Ideas
- Summary and Recommendation
- Conclusion
- 9. Text Summarization
- What Youll Learn and What Well Build
- Text Summarization
- Extractive Methods
- Data Preprocessing
- Blueprint: Summarizing Text Using Topic Representation
- Identifying Important Words with TF-IDF Values
- LSA Algorithm
- Blueprint: Summarizing Text Using an Indicator Representation
- Measuring the Performance of Text Summarization Methods
- Blueprint: Summarizing Text Using Machine Learning
- Step 1: Creating Target Labels
- Step 2: Adding Features to Assist Model Prediction
- Step 3: Build a Machine Learning Model
- Closing Remarks
- Further Reading
- 10. Exploring Semantic Relationships with Word Embeddings
- What Youll Learn and What Well Build
- The Case for Semantic Embeddings
- Word Embeddings
- Analogy Reasoning with Word Embeddings
- Types of Embeddings
- Word2Vec
- GloVe
- FastText
- Deep contextualized embeddings
- Blueprint: Using Similarity Queries on Pretrained Models
- Loading a Pretrained Model
- Similarity Queries
- Blueprints for Training and Evaluating Your Own Embeddings
- Data Preparation
- Phrases
- Data Preparation
- Blueprint: Training Models with Gensim
- Blueprint: Evaluating Different Models
- Looking for similar concepts
- Analogy reasoning on our own models
- Blueprints for Visualizing Embeddings
- Blueprint: Applying Dimensionality Reduction
- Blueprint: Using the TensorFlow Embedding Projector
- Blueprint: Constructing a Similarity Tree
- Closing Remarks
- Further Reading
- 11. Performing Sentiment Analysis on Text Data
- What Youll Learn and What Well Build
- Sentiment Analysis
- Introducing the Amazon Customer Reviews Dataset
- Blueprint: Performing Sentiment Analysis Using Lexicon-Based Approaches
- Bing Liu Lexicon
- Disadvantages of a Lexicon-Based Approach
- Supervised Learning Approaches
- Preparing Data for a Supervised Learning Approach
- Blueprint: Vectorizing Text Data and Applying a Supervised Machine Learning Algorithm
- Step 1: Data Preparation
- Step 2: Train-Test Split
- Step 3: Text Vectorization
- Step 4: Training the Machine Learning Model
- Pretrained Language Models Using Deep Learning
- Deep Learning and Transfer Learning
- Blueprint: Using the Transfer Learning Technique and a Pretrained Language Model
- Step 1: Loading Models and Tokenization
- Step 2: Model Training
- Step 3: Model Evaluation
- Closing Remarks
- Further Reading
- 12. Building a Knowledge Graph
- What Youll Learn and What Well Build
- Knowledge Graphs
- Information Extraction
- Introducing the Dataset
- Named-Entity Recognition
- Blueprint: Using Rule-Based Named-Entity Recognition
- Blueprint: Normalizing Named Entities
- Merging Entity Tokens
- Coreference Resolution
- Blueprint: Using spaCys Token Extensions
- Blueprint: Performing Alias Resolution
- Blueprint: Resolving Name Variations
- Blueprint: Performing Anaphora Resolution with NeuralCoref
- Name Normalization
- Entity Linking
- Blueprint: Creating a Co-Occurrence Graph
- Extracting Co-Occurrences from a Document
- Visualizing the Graph with Gephi
- Relation Extraction
- Blueprint: Extracting Relations Using Phrase Matching
- Blueprint: Extracting Relations Using Dependency Trees
- Creating the Knowledge Graph
- Dont Blindly Trust the Results
- Closing Remarks
- Further Reading
- 13. Using Text Analytics in Production
- What Youll Learn and What Well Build
- Blueprint: Using Conda to Create Reproducible Python Environments
- Blueprint: Using Containers to Create Reproducible Environments
- Blueprint: Creating a REST API for Your Text Analytics Model
- Blueprint: Deploying and Scaling Your API Using a Cloud Provider
- Blueprint: Automatically Versioning and Deploying Builds
- Closing Remarks
- Further Reading
- Index
O'Reilly Media - inne książki
-
This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact ...(193.69 zł najniższa cena z 30 dni)
193.19 zł
249.00 zł(-22%) -
In today's fast-paced world, more and more organizations require rapid application development with reduced development costs and increased productivity. This practical guide shows application developers how to use PowerApps, Microsoft's no-code/low-code application framework that helps developer...(162.47 zł najniższa cena z 30 dni)
162.27 zł
209.00 zł(-22%) -
Welcome to the systems age, where software professionals are no longer building software&emdash;we're building systems of software. Change is continuously deployed across software ecosystems coordinated by responsive infrastructure. In this world of increasing relational complexity, we need t...(152.21 zł najniższa cena z 30 dni)
152.01 zł
209.00 zł(-27%) -
This book provides an ideal guide for Python developers who want to learn how to build applications with large language models. Authors Olivier Caelen and Marie-Alice Blete cover the main features and benefits of GPT-4 and GPT-3.5 models and explain how they work. You'll also get a step-by-step g...(155.41 zł najniższa cena z 30 dni)
155.36 zł
209.00 zł(-26%) -
In today's cloud native world, where we automate as much as possible, everything is code. With this practical guide, you'll learn how Policy as Code (PaC) provides the means to manage the policies, related data, and responses to events that occur within the systems we maintain—Kubernetes, c...(212.59 zł najniższa cena z 30 dni)
212.39 zł
279.00 zł(-24%) -
Geared to intermediate- to advanced-level DBAs and IT professionals looking to enhance their MySQL skills, this guide provides a comprehensive overview on how to manage and optimize MySQL databases. You'll learn how to create databases and implement backup and recovery, security configurations, h...(221.43 zł najniższa cena z 30 dni)
221.33 zł
279.00 zł(-21%) -
Get the details, examples, and best practices you need to build generative AI applications, services, and solutions using the power of Azure OpenAI Service. With this comprehensive guide, Microsoft AI specialist Adrián González Sánchez examines the integration and utilization of Az...(162.23 zł najniższa cena z 30 dni)
162.18 zł
209.00 zł(-22%) -
Despite the increase of high-profile hacks, record-breaking data leaks, and ransomware attacks, many organizations don't have the budget for an information security (InfoSec) program. If you're forced to protect yourself by improvising on the job, this pragmatic guide provides a security-101 hand...(214.77 zł najniższa cena z 30 dni)
214.57 zł
239.00 zł(-10%) -
Keeping up with the Python ecosystem can be daunting. Its developer tooling doesn't provide the out-of-the-box experience native to languages like Rust and Go. When it comes to long-term project maintenance or collaborating with others, every Python project faces the same problem: how to build re...(189.29 zł najniższa cena z 30 dni)
188.79 zł
239.00 zł(-21%) -
Bringing a deep-learning project into production at scale is quite challenging. To successfully scale your project, a foundational understanding of full stack deep learning, including the knowledge that lies at the intersection of hardware, software, data, and algorithms, is required.This book il...(227.19 zł najniższa cena z 30 dni)
227.14 zł
279.00 zł(-19%)
Dzieki opcji "Druk na żądanie" do sprzedaży wracają tytuły Grupy Helion, które cieszyły sie dużym zainteresowaniem, a których nakład został wyprzedany.
Dla naszych Czytelników wydrukowaliśmy dodatkową pulę egzemplarzy w technice druku cyfrowego.
Co powinieneś wiedzieć o usłudze "Druk na żądanie":
- usługa obejmuje tylko widoczną poniżej listę tytułów, którą na bieżąco aktualizujemy;
- cena książki może być wyższa od początkowej ceny detalicznej, co jest spowodowane kosztami druku cyfrowego (wyższymi niż koszty tradycyjnego druku offsetowego). Obowiązująca cena jest zawsze podawana na stronie WWW książki;
- zawartość książki wraz z dodatkami (płyta CD, DVD) odpowiada jej pierwotnemu wydaniu i jest w pełni komplementarna;
- usługa nie obejmuje książek w kolorze.
Masz pytanie o konkretny tytuł? Napisz do nas: sklep[at]helion.pl.
Książka, którą chcesz zamówić pochodzi z końcówki nakładu. Oznacza to, że mogą się pojawić drobne defekty (otarcia, rysy, zagięcia).
Co powinieneś wiedzieć o usłudze "Końcówka nakładu":
- usługa obejmuje tylko książki oznaczone tagiem "Końcówka nakładu";
- wady o których mowa powyżej nie podlegają reklamacji;
Masz pytanie o konkretny tytuł? Napisz do nas: sklep[at]helion.pl.
Książka drukowana
Oceny i opinie klientów: Blueprints for Text Analytics Using Python Jens Albrecht, Sidharth Ramachandran, Christian Winkler (0) Weryfikacja opinii następuję na podstawie historii zamówień na koncie Użytkownika umieszczającego opinię. Użytkownik mógł otrzymać punkty za opublikowanie opinii uprawniające do uzyskania rabatu w ramach Programu Punktowego.