CS 395T Topics in Natural Language Processing (Fall 2023) UT Austin
Course Objective:
We will discuss emerging topics in natural language processing, focusing on understanding the development and usages of large-scale language models. We will focus on four topics (learning objectives of LMs, model specialization, continual learning of LMs and human-LM interaction). This course is designed for graduate students and highly motivated undergraduates who are interested in NLP research. This course aims to teach and practice:
- Cutting-edge research in natural language processing. You will learn recent progresses and remaining challenges.
- How to formulate and evaluate NLP problems and develop solutions for them.
- How to read and criticize research papers and communicate research both orally and in writing.
Course Structure:
This is a seminar course where most class will consist of students presenting 1-2 recent research paper and leading discussions on it. We will follow role-playing paper reading seminar. Each week, you are expected to read 2-4 papers carefully.
The course is designed as an active seminar course. You will get to know your classmates, discuss papers and approaches, sometimes struggle to understand challenging papers that is hard to understand. Ideally, you should be excited about the intricacies of language understanding and willing to independently explore the literature.
Everyone are required to read the assigned paper prior to the class. A subset of the students will be assigned “presentation roles” on a rotating basis for which they will need to come prepared. The presentation roles are designed to expose students to the different aspects of being a researcher. By the end of the course, students should be comfortable reading, reviewing, implementing and extending NLP papers.
Prerequisites / Intended audiences:
This is an advanced graduate level course and assumes background in machine learning, programming and basic knowledge of natural language processing (equivalent to CS 371/CS388 course at UT). I don’t strictly enforce whether you have taken courses or not, but you should be comfortable with what’s taught in those courses. If you are looking for a lecture-based course with a structured, instructor-driven overview of NLP, this is probably not the right course for you. Consider taking CS 371 (undergraduate) or CS 388 (graduate) course. You should be comfortable digesting a research paper from ML/NLP venues.
Note: If you are unclear whether you meet these requirements, please consult the instructor in advance. Auditing is not allowed unless previously discussed with the instructor.
Logistics:
Course time/location: Tuesday / Thursdays 3:30-4:45PM at RLP 0.104
Office hours:
Instructor: Eunsol Choi (eunsol@utexas.edu)
- Time: Wednesday 1:00-2:00PM (or by appointment) Location: GDC 3.810
TA: Fangyuan Xu (fangyuan@utexas.edu)
- Time: Tuesday 10:30-11:30AM, Location: GDC 3.802G
Required Materials / Equipment:
- You will need an access to laptops (to log in to zoom for online classes) and to make presentation, etc. We will not have textbook.
Grading / Workload
% of grade | Due Dates | |
Class presentation | 25% | |
Discussion Lead Sign-up | Aug 25th | |
Discussion Lead | 15% | throughout the semester |
Final Project Presentation | 10% | Nov 28th, 30th |
Class participation | 25% | |
Provide feedback to classmates intermediate check-ins | 7% | |
Participation in-class / attendance / role-playing | 18% | throughout |
Mini Writing Assignment | 10% | Sept 7th |
Final Project | 40% | |
One-page Check-in | 5% | |
Intermediate Check-in (2-3 pages) | 5% | Oct 31st |
Final Write-up | 30% |
In-class presentation (25% of grades)
- Once or twice during the semester, you will prepare a presentation and lead a discussion of the paper. The paper will be selected by the instructor, but you can discuss if there are particular paper you’d like to present. You will prepare about 30 minutes material of contents.
- Discussion Lead (15% of grade) Your presentation will be graded based on:
- Clarity / Coherence: Each presentation should aim to be relatively self-contained.
- Comprehensiveness: Covers core contributions of the paper clearly, present work in context of existing work.
- I highly encourage making slides on Google Slidedeck (template here) as it’s easiest to share. But you can use others and send the presentation slides form in PDF at least 24 hour before the class time to the instructor and TA.
- Final Project Presentation (10% of grade)
- This will happen in the last week of the class, where you will introduce a short (10 minutes) presentation of one of the writing assignments you have done this semester.
Class participation (25% of grades)
- Provide feedback for classmate’s writing assignments (7% of grade)
- You will provide a detailed feedback on classmate’s writing assignments.
- You will be given the draft one week before the final deadline, and should return the feedback in three days, such that they have time to incorporate your feedback.
- Participation in class discussion (18% of grade)
- For this seminar class, attendance is mandatory and sessions will not be recorded.
- You will sign up to help presenter 3-4 times during the semester by selecting one of the “roles” that presenter selects.
Teacher Your role is to help people do not have background to understand this paper. Imagine a use case where you are building a collaboration with different disciplines (ux designer, medical doctors, social scientists, etc) or industry who would be using this technology, and explain the paper in the manner they can understand. Peer Reviewer The paper has not been published yet and is currently submitted to a top conference where you’ve been assigned as a peer reviewer. Complete a full review of the paper answering all prompts of the official review form of the top venue in this research area. This includes recommending whether to accept or reject the paper. Archaeologist This paper was found buried under ground in the desert. You’re an archeologist who must determine where this paper sits in the context of previous and subsequent work. Find and report on one older paper cited within the current paper that substantially influenced the current paper and one newer paper that cites this current paper. Hacker You’re a hacker who needs a demo of this paper ASAP. Implement a small part or simplified version of the paper on a small dataset or toy problem. Prepare to share the core code of the algorithm to the class and demo your implementation. Do not simply download and run an existing implementation – though you are welcome to use (and give credit to) an existing implementation for “backbone” code.
Mini Writing Assignment (10% of total grade)
- This assignment must be done individually.
- In this assignment, you will ponder about ethical/societal impact of NLP technology. Towards this end, you will do two short writing assignment:
- Write a version of “AI/NLP Researcher’s Oath”, describing standards and principles for people developing AI/NLP technology. You can think of this something similar to Hippocratic Oath for Doctors. The format is flexible, but you should aim to cover various axis of ethical codes that’s relevant for AI researchers at this day and era. (up to 300 words).
- You will find one article from the media discussing LLM and its impact on society and write a short commentary about it (up to 300 words). How do you think about the viewpoint presented by the journalist? Do you think they included interviews of adequate parties? What is missing in their articles?
- In this assignment, you will ponder about ethical/societal impact of NLP technology. Towards this end, you will do two short writing assignment:
Final Project (40% of the total grade)
- Overall guideline:
- You will complete one assignment out of three options throughout the semester.
- Some writing assignments can be done in pairs. You can have different partner for different assignment.
- For each writing assignment, you will submit a draft version one week before the deadline, which will be reviewed by your classmates (see class participation section).
- The final week of the semester, you will present one of your writing assignments to classmates by making a short presentation.
- You have to decide between the three options by
September 10thSeptember 28th.
- More detailed guideline will be provided later in the course.
- Track 1: Technical blog post (can be done in 2-3 people group)
- In this assignment, you are to pick a topic relevant to NLP and write a technical blog article about it. Your post should cover one to three papers in depth and should contain some novel analysis that goes beyond what’s already in those three papers. This can involve reproducing previous results, and running new analysis/codes.
- You can look at ICLR blog track for inspiration: https://iclr.cc/Conferences/2023/CallForBlogPosts
- Track 2: Final Project (can be done in a group of 2-3 people)
- If you choose this option, you will design and pursue final project that is relevant to natural language processing. You can choose any topics in NLP (either covered or not covered in the class).
- You can refer to ACL proceedings or other ML conference proceedings (NeurIPS, ICLR, ICML) for inspiration. Your project can focus on either:
- A new model architecture for existing problems (a variant of an existing model)
- A new training, optimization, or evaluation method for existing problems
- A new application of NLP technology -- here, you will apply an existing model to a new task. Please motivate the task carefully.
- Experimental and/or theoretical analysis of datasets, approaches, or models.
- Track 3: Final Project Proposal (individual)
- This option allows you to write a final project proposal without necessarily carrying out the proposed research yourself. You can think of this as “writing the paper first” practice!
- As this is only a proposal, you are not limited by computational resources or human resources. Often class project has to be severely limited in scope. Here, you can dream big, assuming you have a large amount of compute resources and even an access to forbidden weights. HOWEVER, you should justify your research idea clearly and carefully. Why, would anyone want to execute this research if they have such resources to spare? What would take to gather such resources? You should have a section that discusses practical limitations of your proposal.
Course Policy / Logistics / Communication
- All assignments are due midnight on the deadline date, to be submitted on Canvas (unless otherwise noted). You will have four slip days (each slip day is 24 hours) throughout the semester, which you can use for mini writing assignment or final project (but does not apply for course presentation or course participation (including writing assignment feedback)).
- By default, all classes will be in person and will not support online attendance. However, a few sessions will be held online (will be announced beforehand) if we have virtual guests or other special circumstances. I will not take formal attendance but attendance will count towards final class participation grade. It is okay to skip a few classes throughout the semester when you are traveling or sick (don’t show up sick), but if you did not show up for classes consistently (or do not participate in class discussions), you will not receive full credit for class participation. Also, see section on religious holy day below.
- We will use EdDiscussion for class communication (announcements, etc) and Canvas for assignment submissions.
Asking for help
The best way to reach the course staff is posting on EdDiscussion. If you cannot make the office hour time, I’d be happy to arrange another time if possible.
GRADE BREAKS
+/- grades will be used for the final class grade.
Grade | Cutoff |
A | 94% |
A- | 90% |
B+ | 87% |
B | 84% |
B- | 80% |
C+ | 77% |
C | 74% |
C- | 70% |
D+ D D- | 67% 64% 60% |
F | <60% |
Topics that will be covered in this class
The paper and topic list below is tentative and will be finalized before the week of discussion. Each topic will be discussed for about 5-6 classes, and there will be 2-3 guest lectures.
Preliminary
Before starting the discussions, we will review recent papers describing architecture and training of base LMs. This will give us background to understand LLMs better for the semester.
Beyond Next Word Prediction — Learning Objective of LMs
While language models are known to be trained with ``next word prediction", most recent models are also trained with other objectives. In this part, we delve into these newer pre-training objectives such as learning from human feedback, instruction tuning, etc.
Background:
- Improving language models by retrieving from trillions of tokens (Borgeaud et al., 2022)
High-level Discussion:
Data / Framework:
Alternative Learning Objectives (including non-RL):
Theory-of-Mind:
Specialization of LMs
The scaling law is undeniable in the recent era of language models. Can we borrow the power of large models into smaller models? We explore a few paths that have been proposed to distill bigger model into a smaller model, including symbolic distillation (where larger models are used to generate text data).
Symbolic distillation
- Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (West at al, NAACL 2022)
Specialization of LM pre-training
Adaptors / Domain specific modeling
High-level Discussion:
Adding actions to prediction space:
Human - LLM interaction
In this section, we will focus on how humans will interact with emerging technology. What are emerging use cases, and how does LM shape human behaviors? We will look into both papers from HCI literature as well as from NLP/ML venues.
High level Discussion
- Evaluating Human-Language Model Interaction (Lee et al, preprint, 2023)
Information Visualization:
- ScatterShot: Interactive In-context Example Curation for Text Transformation (Wu et al, IUI 23)
Using LLMs
- Why Johnny Can't Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts (CHI 2023) J.D. Zamfirescu-Pereira, Richmond Wong, Bjoern Hartmann, Qian Yang
Impact on Humans
Knowledge Augmentation / Editing
Language models function as a knowledge base, using knowledge memorized during the pretraining stage. How can we inspect the knowledge and potentially update the outdated knowledge? How can we inject new knowledge or remove outdated knowledge in LMs? This part could also touch upon causality literature (which parts of the neural network parameters contain specific knowledge?) This part will cover two paths for knowledge updating: retrieval based knowledge augmentation and parameter update methods.
Knowledge Localization
Task-level Editing
Retrieval-based Augmentation
- Retrieval-Augmented Multimodal Language Modeling (Yasunaga et al., 2023)
Course Schedule
Accommodation
If you are a student with a disability, or think you may have a disability, and need accommodations please contact Disability and Access (D&A). You may refer to D&A website for contact and more information: http://diversity.utexas.edu/disability. If you are already registered with D&A, please deliver your Accommodation Letter to me as early as possible in the semester so we can discuss your approved accommodations.
The university is committed to creating an accessible and inclusive learning environment consistent with university policy and federal and state law. Please let me know if you experience any barriers to learning so I can work with you to ensure you have equal opportunity to participate fully in this course.
Honor code
The University of Texas at Austin strives to create a dynamic and engaging community of teaching and learning where students feel intellectually challenged; build knowledge and skills; and develop critical thinking, creativity, and intellectual curiosity. As a part of this community, it is important to engage in assignments, exams, and other work for your classes with openness, integrity, and a willingness to make mistakes and learn from them. The UT Austin honor code champions these principles:
I pledge, as a member of the University of Texas community, to do my work honestly, respectfully, and through the intentional pursuit of learning and scholarship.
The honor code affirmation includes three additional principles that elaborate on the core theme:
- I pledge to be honest about what I create and to acknowledge what I use that belongs to others.
- I pledge to value the process of learning in addition to the outcome, while celebrating and learning from mistakes.
- This code encompasses all of the academic and scholarly endeavors of the university community.
The honor code is more than a set of rules, it reflects the values that are foundational to your academic community. By affirming and embracing the honor code, you are both upholding the integrity of your work and contributing to a campus culture of trust and respect.
Academic Integrity
Students who violate University rules on academic misconduct are subject to the student conduct process. A student found responsible for academic misconduct may be assigned both a status sanction and a grade impact for the course. The grade impact could range from a zero on the assignment in question up to a failing grade in the course. A status sanction can range from a written warning, probation, deferred suspension and/or dismissal from the University. To learn more about academic integrity standards, tips for avoiding a potential academic misconduct violation, and the overall conduct process, please visit the Student Conduct and Academic Integrity website at: http://deanofstudents.utexas.edu/conduct.
Sharing of Course Materials is Prohibited
No materials used in this class, including, but not limited to, lecture hand-outs, videos, assessments (quizzes, exams, papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class without explicit, my written permission. Unauthorized sharing of materials may facilitate cheating. The University is aware of the sites used for sharing materials, and any materials found online that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in initiation of the student conduct process and include charge(s) for academic misconduct, potentially resulting in sanctions, including a grade impact.
Using Artificial Intelligence
The creation of artificial intelligence tools for widespread use is an exciting innovation. These tools have both appropriate and inappropriate uses in classwork. The use of artificial intelligence tools (such as ChatGPT) in this class:
- …is permitted for students who wish to use them, provided the content generated by AI is properly cited.
If you are considering the use of AI writing tools but are unsure if you are allowed or the extent to which they may be utilized appropriately, please ask.
For more information about AI in education, see the Center for Teaching and Learning’s “5 Things to Know about ChatGPT” webpage that includes additional suggested syllabi statements for your consideration.]
Religious Holy Days
By UT Austin policy, you must notify me of your pending absence for a religious holy day as far in advance as possible of the date of observance. If you must miss a class, an examination, a work assignment, or a project in order to observe a religious holy day, you will be given an opportunity to complete the missed work within a reasonable time after the absence.
Names and pronouns
Class rosters are provided to the instructor with the student’s legal name, unless they have added a chosen name with the registrar’s office. If you have not yet done so, I will gladly honor your request to address you with the name and pronouns that you prefer for me to use for you. It is helpful to advise me of any changes or needs regarding your name and pronouns early in the semester so that I may make appropriate updates to my records and be informed about how to support you in this class.
- For instructions on how to add your pronouns to Canvas, visit this site.
- If you would like to update your chosen name with the registrar’s office, you can do so here, and reference this guide.
- For additional guidelines prepared by the Gender and Sexuality Center for changing your name on various campus systems, see the Resources page under UT Resources here.
Important Safety Information
Students in this class should be aware of the following university policies related to Texas’ Open Carry Law:
- Students in this class who hold a license to carry are asked to review the university policy regarding campus carry.
- Individuals who hold a license to carry are eligible to carry a concealed handgun on campus, including in most outdoor areas, buildings and spaces that are accessible to the public, and in classrooms.
- It is the responsibility of concealed-carry license holders to carry their handguns on or about their person at all times while on campus. Open carry is NOT permitted, meaning that a license holder may not carry a partially or wholly visible handgun on campus premises or on any university driveway, street, sidewalk or walkway, parking lot, parking garage, or other parking area.
- Per my right, I prohibit carrying of handguns in my personal office. Note that this information will also be conveyed to all students verbally during the first week of class. This written notice is intended to reinforce the verbal notification, and is not a “legally effective” means of notification in its own right.
TITLE IX Disclosure
Beginning January 1, 2020, Texas Education Code, Section 51.252 (formerly known as Senate Bill 212) requires all employees of Texas universities, including faculty, to report to the Title IX Office any information regarding incidents of sexual harassment, sexual assault, dating violence, or stalking that is disclosed to them. Texas law requires that all employees who witness or receive information about incidents of this type (including, but not limited to, written forms, applications, one-on-one conversations, class assignments, class discussions, or third-party reports) must report it to the Title IX Coordinator. Before talking with me, or with any faculty or staff member about a Title IX-related incident, please remember that I will be required to report this information.
Although graduate teaching and research assistants are not subject to Texas Education Code, Section 51.252, they are mandatory reporters under federal Title IX regulations and are required to report a wide range of behaviors we refer to as sexual misconduct, including the types of misconduct covered under Texas Education Code, Section 51.252. Title IX of the Education Amendments of 1972 is a federal civil rights law that prohibits discrimination on the basis of sex – including pregnancy and parental status – in educational programs and activities. The Title IX Office has developed supportive ways and compiled campus resources to support all impacted by a Title IX matter.
If you would like to speak with a case manager, who can provide support, resources, or academic accommodations, in the Title IX Office, please email: supportandresources@austin.utexas.edu. Case managers can also provide support, resources, and accommodations for pregnant, nursing, and parenting students.
For more information about reporting options and resources, please visit: https://titleix.utexas.edu, contact the Title IX Office via email at: titleix@austin.utexas.edu, or call 512-471-0419.
Campus safety
The following are recommendations regarding emergency evacuation from the Office of Emergency Management, 512-232-2114:
- Students should sign up for Campus Emergency Text Alerts at the page linked above.
- Occupants of buildings on The University of Texas at Austin campus must evacuate buildings when a fire alarm is activated. Alarm activation or announcement requires exiting and assembling outside.
- Familiarize yourself with all exit doors of each classroom and building you may occupy. Remember that the nearest exit door may not be the one you used when entering the building.
- Students requiring assistance in evacuation shall inform their instructor in writing during the first week of class.
- In the event of an evacuation, follow the instruction of faculty or class instructors. Do not re-enter a building unless given instructions by the following: Austin Fire Department, The University of Texas at Austin Police Department, or Fire Prevention Services office.
- For more information, please visit the Office of Emergency Management.
University Resources
For a list of university resources that may be helpful to you as you engage with and navigate your courses and the university, see the University Resources Students Canvas page.