Text Analysis in Python

Welcome to the Text Analysis workshop for Python! Below is the list of lessons including a brief summary.

Prerequisites

Python experience is required for this workshop.

Schedule

	Setup	Download files required for the lesson
00:00	1. Introduction to Natural Language Processing	What is Natural Language Processing? What tasks can be done by Natural Language Processing? What does a workflow for an NLP project look?
00:35	2. Vector Space and Distance	How can we model documents effectively? How can we measure similarity between documents? What’s the difference between cosine similarity and distance?
01:15	3. Preparing and Preprocessing Your Data	How can I prepare data for NLP? What are tokenization, casing and lemmatization?
01:35	4. Document Embeddings and TF-IDF	todo
02:15	5. Latent Semantic Analysis	todo
02:55	6. Intro to Word Embeddings	How can we extract vector representations of individual words rather than documents? What sort of research questions can be answered with word embedding models?
03:35	7. The Word2Vec Algorithm	How does the Word2Vec model produce meaningful word embeddings? How is a Word2Vec model trained?
04:20	8. Training Word2Vec	TODO
05:00	9. Ethics and Text Analysis	todo
05:40	10. LLMs and BERT Overview	What is a large language model? What is BERT? How does attention work?
06:20	11. APIs	How do I know what data I can use for my corpus? How can I use an API to acquire data?
07:00	12. APIs	How do I know what data I can use for my corpus? How can I use an API to acquire data?
07:40	13. BERTClassifier	TODO
08:20	14. BERTIntro	TODO
09:00	15. IntroToTasks	TODO
09:40	16. LSA	TODO
10:20	17. Preprocessing	TODO
11:00	18. VectorSpace	TODO
11:40	Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.