This lesson is still being designed and assembled (Pre-Alpha version)

Text Analysis in Python

Welcome to the Text Analysis workshop for Python! Below is the list of lessons including a brief summary.

Prerequisites

Python experience is required for this workshop.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction to Natural Language Processing What is Natural Language Processing?
What tasks can be done by Natural Language Processing?
What does a workflow for an NLP project look?
00:35 2. Vector Space and Distance How can we model documents effectively?
How can we measure similarity between documents?
What’s the difference between cosine similarity and distance?
01:15 3. Preparing and Preprocessing Your Data How can I prepare data for NLP?
What are tokenization, casing and lemmatization?
01:35 4. Document Embeddings and TF-IDF todo
02:15 5. Latent Semantic Analysis todo
02:55 6. Intro to Word Embeddings How can we extract vector representations of individual words rather than documents?
What sort of research questions can be answered with word embedding models?
03:35 7. The Word2Vec Algorithm How does the Word2Vec model produce meaningful word embeddings?
How is a Word2Vec model trained?
04:20 8. Training Word2Vec TODO
05:00 9. Ethics and Text Analysis todo
05:40 10. LLMs and BERT Overview What is a large language model?
What is BERT?
How does attention work?
06:20 11. APIs How do I know what data I can use for my corpus?
How can I use an API to acquire data?
07:00 12. APIs How do I know what data I can use for my corpus?
How can I use an API to acquire data?
07:40 13. BERTClassifier TODO
08:20 14. BERTIntro TODO
09:00 15. IntroToTasks TODO
09:40 16. LSA TODO
10:20 17. Preprocessing TODO
11:00 18. VectorSpace TODO
11:40 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.