
# NLP Pipeline

## 🔖 Outline

Section outline as covered in the book. 

* Data Acquisition
* Text Extraction and Cleanup
  * HTML Parsing and Cleanup
  * Unicode Normalization
  * Spelling Correction
  * System-Specific Error Correction
* Pre-Processing
  * Preliminaries
  * Frequent Steps
  * Other Pre-Processing Steps
  * Advanced Processing
* Feature Engineering
  * Classical NLP/ML Pipeline
  * DL Pipeline
* Modeling
  * Start with Simple Heuristics
  * Building Your Model
  * Building THE Model
* Evaluation
  * Intrinsic Evaluation
  * Extrinsic Evaluation
* Post-Modeling Phases
  * Deployment
  * Monitoring
  * Model Updating
* Working with Other Languages
* Case Study
* Wrapping Up
* References


## 🗒️ Notebooks

Set of notebooks associated with the chapter. 

1. **[Web Scraping using BeautifulSoup](https://github.com/practical-nlp/practical-nlp/blob/master/Ch2/01_WebScraping_using_BeautifulSoup.ipynb)**: Here we demonstrate to scrape a web page(we use stackoverflow.com here as an example) and parse HTML using bs4 to find and extract relevant information.

2. **[Web Scraping using Scrapy](https://github.com/practical-nlp/practical-nlp/tree/master/Ch2/02_WebScraping_using_scrapy)** : Here we demonstrate how to use scrapy to scrape data from websites and save it using a pipeline.

3. **[Text Extraction from Images](https://github.com/practical-nlp/practical-nlp/blob/master/Ch2/03_Extracting_text_from_images_tesseract.ipynb)**: Here we demonstrate how we can use py-tesseract to extract text from images. 

4. **[Common Pre-processing Steps](https://github.com/practical-nlp/practical-nlp/blob/master/Ch2/04_Tokenization_Stemming_lemmatization_stopword_postagging.ipynb)**: Here we demonstrate the most commonly performed text pre-processing steps using various libraries. 

5. **[Data Augmentation](https://github.com/practical-nlp/practical-nlp/blob/master/Ch2/05_Data_Augmentation_Using_NLPaug.ipynb)**: Here we demonstrate data augmentation using nlpaug.


## 🖼️ Figures

Color figures as requested by the readers. 

![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-1.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-2.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-3.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-4.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-5.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-6.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-7.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-8.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-9.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-10.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-11.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-12.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-13.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-14.png)
![figure](https://github.com/practical-nlp/practical-nlp-figures/raw/master/figures/2-15.png)



