CS 395T Topics in Natural Language Processing (Fall 2023) UT Austin

Course Objective:

We will discuss emerging topics in natural language processing, focusing on understanding the development and usages of large-scale language models. We will focus on four topics (learning objectives of LMs, model specialization, continual learning of LMs and human-LM interaction). This course is designed for graduate students and highly motivated undergraduates who are interested in NLP research. This course aims to teach and practice:

Course Structure:

This is a seminar course where most class will consist of students presenting 1-2 recent research paper and leading discussions on it. We will follow role-playing paper reading seminar. Each week, you are expected to read 2-4 papers carefully.

The course is designed as an active seminar course. You will get to know your classmates, discuss papers and approaches, sometimes struggle to understand challenging papers that is hard to understand. Ideally, you should be excited about the intricacies of language understanding and willing to independently explore the literature.

Everyone are required to read the assigned paper prior to the class. A subset of the students will be assigned “presentation roles” on a rotating basis for which they will need to come prepared. The presentation roles are designed to expose students to the different aspects of being a researcher. By the end of the course, students should be comfortable reading, reviewing, implementing and extending NLP papers.

Prerequisites / Intended audiences:

This is an advanced graduate level course and assumes background in machine learning, programming and basic knowledge of natural language processing (equivalent to CS 371/CS388 course at UT). I don’t strictly enforce whether you have taken courses or not, but you should be comfortable with what’s taught in those courses. If you are looking for a lecture-based course with a structured, instructor-driven overview of NLP, this is probably not the right course for you. Consider taking CS 371 (undergraduate) or CS 388 (graduate) course. You should be comfortable digesting a research paper from ML/NLP venues.

Note: If you are unclear whether you meet these requirements, please consult the instructor in advance. Auditing is not allowed unless previously discussed with the instructor.

Logistics:

Course time/location: Tuesday / Thursdays 3:30-4:45PM at RLP 0.104

Office hours:

Instructor: Eunsol Choi (eunsol@utexas.edu)

  • Time: Wednesday 1:00-2:00PM (or by appointment) Location: GDC 3.810

TA: Fangyuan Xu (fangyuan@utexas.edu)

  • Time: Tuesday 10:30-11:30AM, Location: GDC 3.802G

Required Materials / Equipment:


Grading / Workload

% of gradeDue Dates
Class presentation 25%
Discussion Lead Sign-upAug 25th
Discussion Lead15%throughout the semester
Final Project Presentation10%Nov 28th, 30th
Class participation25%
Provide feedback to classmates intermediate check-ins7%Nov 2th Nov 4th
Participation in-class / attendance / role-playing 18%throughout
Mini Writing Assignment10%Sept 7th
Final Project40%
One-page Check-in 5%Sept 20thSept 28th
Intermediate Check-in (2-3 pages)5%Oct 31st
Final Write-up30%Nov 21stDec 3rd

In-class presentation (25% of grades)

  • Once or twice during the semester, you will prepare a presentation and lead a discussion of the paper. The paper will be selected by the instructor, but you can discuss if there are particular paper you’d like to present. You will prepare about 30 minutes material of contents.
  • Discussion Lead (15% of grade) Your presentation will be graded based on:
    • Clarity / Coherence: Each presentation should aim to be relatively self-contained.
    • Comprehensiveness: Covers core contributions of the paper clearly, present work in context of existing work.
    • I highly encourage making slides on Google Slidedeck (template here) as it’s easiest to share. But you can use others and send the presentation slides form in PDF at least 24 hour before the class time to the instructor and TA.
  • Final Project Presentation (10% of grade)
    • This will happen in the last week of the class, where you will introduce a short (10 minutes) presentation of one of the writing assignments you have done this semester.

Class participation (25% of grades)

  • Provide feedback for classmate’s writing assignments (7% of grade)
    • You will provide a detailed feedback on classmate’s writing assignments.
    • You will be given the draft one week before the final deadline, and should return the feedback in three days, such that they have time to incorporate your feedback.
  • Participation in class discussion (18% of grade)
    • For this seminar class, attendance is mandatory and sessions will not be recorded.
    • You will sign up to help presenter 3-4 times during the semester by selecting one of the “roles” that presenter selects.
      Teacher Your role is to help people do not have background to understand this paper. Imagine a use case where you are building a collaboration with different disciplines (ux designer, medical doctors, social scientists, etc) or industry who would be using this technology, and explain the paper in the manner they can understand.
      
      Peer Reviewer The paper has not been published yet and is currently submitted to a top conference where you’ve been assigned as a peer reviewer. Complete a full review of the paper answering all prompts of the official review form of the top venue in this research area. This includes recommending whether to accept or reject the paper.
      
      Archaeologist This paper was found buried under ground in the desert. You’re an archeologist who must determine where this paper sits in the context of previous and subsequent work. Find and report on one older paper cited within the current paper that substantially influenced the current paper and one newer paper that cites this current paper.
      
      Hacker You’re a hacker who needs a demo of this paper ASAP. Implement a small part or simplified version of the paper on a small dataset or toy problem. Prepare to share the core code of the algorithm to the class and demo your implementation. Do not simply download and run an existing implementation – though you are welcome to use (and give credit to) an existing implementation for “backbone” code.

Mini Writing Assignment (10% of total grade)

Final Project (40% of the total grade)

  • Overall guideline:
    • You will complete one assignment out of three options throughout the semester.
    • Some writing assignments can be done in pairs. You can have different partner for different assignment.
    • For each writing assignment, you will submit a draft version one week before the deadline, which will be reviewed by your classmates (see class participation section).
    • The final week of the semester, you will present one of your writing assignments to classmates by making a short presentation.
    • You have to decide between the three options by September 10thSeptember 28th.
    • More detailed guideline will be provided later in the course.
  • Track 1: Technical blog post (can be done in 2-3 people group)
    • In this assignment, you are to pick a topic relevant to NLP and write a technical blog article about it. Your post should cover one to three papers in depth and should contain some novel analysis that goes beyond what’s already in those three papers. This can involve reproducing previous results, and running new analysis/codes.
  • Track 2: Final Project (can be done in a group of 2-3 people)
    • If you choose this option, you will design and pursue final project that is relevant to natural language processing. You can choose any topics in NLP (either covered or not covered in the class).
    • You can refer to ACL proceedings or other ML conference proceedings (NeurIPS, ICLR, ICML) for inspiration. Your project can focus on either:
      • A new model architecture for existing problems (a variant of an existing model)
      • A new training, optimization, or evaluation method for existing problems
      • A new application of NLP technology -- here, you will apply an existing model to a new task. Please motivate the task carefully.
      • Experimental and/or theoretical analysis of datasets, approaches, or models.
  • Track 3: Final Project Proposal (individual)
    • This option allows you to write a final project proposal without necessarily carrying out the proposed research yourself. You can think of this as “writing the paper first” practice!
    • As this is only a proposal, you are not limited by computational resources or human resources. Often class project has to be severely limited in scope. Here, you can dream big, assuming you have a large amount of compute resources and even an access to forbidden weights. HOWEVER, you should justify your research idea clearly and carefully. Why, would anyone want to execute this research if they have such resources to spare? What would take to gather such resources? You should have a section that discusses practical limitations of your proposal.

Course Policy / Logistics / Communication

Asking for help

The best way to reach the course staff is posting on EdDiscussion. If you cannot make the office hour time, I’d be happy to arrange another time if possible.

GRADE BREAKS

+/- grades will be used for the final class grade.

GradeCutoff
A94%
A-90%
B+87%
B84%
B-80%
C+77%
C74%
C-70%
D+ D D-67% 64% 60%
F<60%

Topics that will be covered in this class

The paper and topic list below is tentative and will be finalized before the week of discussion. Each topic will be discussed for about 5-6 classes, and there will be 2-3 guest lectures.

Preliminary

Before starting the discussions, we will review recent papers describing architecture and training of base LMs. This will give us background to understand LLMs better for the semester.

Beyond Next Word Prediction — Learning Objective of LMs

While language models are known to be trained with ``next word prediction", most recent models are also trained with other objectives. In this part, we delve into these newer pre-training objectives such as learning from human feedback, instruction tuning, etc.

Background:

High-level Discussion:

Data / Framework:

Alternative Learning Objectives (including non-RL):

Theory-of-Mind:

Specialization of LMs

The scaling law is undeniable in the recent era of language models. Can we borrow the power of large models into smaller models? We explore a few paths that have been proposed to distill bigger model into a smaller model, including symbolic distillation (where larger models are used to generate text data).

Symbolic distillation

Specialization of LM pre-training

Adaptors / Domain specific modeling

High-level Discussion:

Adding actions to prediction space:

Human - LLM interaction

In this section, we will focus on how humans will interact with emerging technology. What are emerging use cases, and how does LM shape human behaviors? We will look into both papers from HCI literature as well as from NLP/ML venues.

High level Discussion

Information Visualization:

Using LLMs

Impact on Humans

Knowledge Augmentation / Editing

Language models function as a knowledge base, using knowledge memorized during the pretraining stage. How can we inspect the knowledge and potentially update the outdated knowledge? How can we inject new knowledge or remove outdated knowledge in LMs? This part could also touch upon causality literature (which parts of the neural network parameters contain specific knowledge?) This part will cover two paths for knowledge updating: retrieval based knowledge augmentation and parameter update methods.

Knowledge Localization

Task-level Editing

Retrieval-based Augmentation


Course Schedule


Accommodation

If you are a student with a disability, or think you may have a disability, and need accommodations please contact Disability and Access (D&A). You may refer to D&A website for contact and more information: http://diversity.utexas.edu/disability. If you are already registered with D&A, please deliver your Accommodation Letter to me as early as possible in the semester so we can discuss your approved accommodations.

The university is committed to creating an accessible and inclusive learning environment consistent with university policy and federal and state law. Please let me know if you experience any barriers to learning so I can work with you to ensure you have equal opportunity to participate fully in this course.

Honor code

The University of Texas at Austin strives to create a dynamic and engaging community of teaching and learning where students feel intellectually challenged; build knowledge and skills; and develop critical thinking, creativity, and intellectual curiosity. As a part of this community, it is important to engage in assignments, exams, and other work for your classes with openness, integrity, and a willingness to make mistakes and learn from them. The UT Austin honor code champions these principles:

I pledge, as a member of the University of Texas community, to do my work honestly, respectfully, and through the intentional pursuit of learning and scholarship.

The honor code affirmation includes three additional principles that elaborate on the core theme:

The honor code is more than a set of rules, it reflects the values that are foundational to your academic community. By affirming and embracing the honor code, you are both upholding the integrity of your work and contributing to a campus culture of trust and respect.

Academic Integrity

Students who violate University rules on academic misconduct are subject to the student conduct process. A student found responsible for academic misconduct may be assigned both a status sanction and a grade impact for the course. The grade impact could range from a zero on the assignment in question up to a failing grade in the course. A status sanction can range from a written warning, probation, deferred suspension and/or dismissal from the University. To learn more about academic integrity standards, tips for avoiding a potential academic misconduct violation, and the overall conduct process, please visit the Student Conduct and Academic Integrity website at: http://deanofstudents.utexas.edu/conduct.

Sharing of Course Materials is Prohibited

No materials used in this class, including, but not limited to, lecture hand-outs, videos, assessments (quizzes, exams, papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class without explicit, my written permission. Unauthorized sharing of materials may facilitate cheating. The University is aware of the sites used for sharing materials, and any materials found online that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in initiation of the student conduct process and include charge(s) for academic misconduct, potentially resulting in sanctions, including a grade impact.

Using Artificial Intelligence

The creation of artificial intelligence tools for widespread use is an exciting innovation. These tools have both appropriate and inappropriate uses in classwork. The use of artificial intelligence tools (such as ChatGPT) in this class:

If you are considering the use of AI writing tools but are unsure if you are allowed or the extent to which they may be utilized appropriately, please ask.

For more information about AI in education, see the Center for Teaching and Learning’s “5 Things to Know about ChatGPT” webpage that includes additional suggested syllabi statements for your consideration.]

Religious Holy Days

By UT Austin policy, you must notify me of your pending absence for a religious holy day as far in advance as possible of the date of observance. If you must miss a class, an examination, a work assignment, or a project in order to observe a religious holy day, you will be given an opportunity to complete the missed work within a reasonable time after the absence.

Names and pronouns

Class rosters are provided to the instructor with the student’s legal name, unless they have added a chosen name with the registrar’s office. If you have not yet done so, I will gladly honor your request to address you with the name and pronouns that you prefer for me to use for you. It is helpful to advise me of any changes or needs regarding your name and pronouns early in the semester so that I may make appropriate updates to my records and be informed about how to support you in this class.

Important Safety Information

Students in this class should be aware of the following university policies related to Texas’ Open Carry Law:

TITLE IX Disclosure

Beginning January 1, 2020, Texas Education Code, Section 51.252 (formerly known as Senate Bill 212) requires all employees of Texas universities, including faculty, to report to the Title IX Office any information regarding incidents of sexual harassment, sexual assault, dating violence, or stalking that is disclosed to them. Texas law requires that all employees who witness or receive information about incidents of this type (including, but not limited to, written forms, applications, one-on-one conversations, class assignments, class discussions, or third-party reports) must report it to the Title IX Coordinator. Before talking with me, or with any faculty or staff member about a Title IX-related incident, please remember that I will be required to report this information.

Although graduate teaching and research assistants are not subject to Texas Education Code, Section 51.252, they are mandatory reporters under federal Title IX regulations and are required to report a wide range of behaviors we refer to as sexual misconduct, including the types of misconduct covered under Texas Education Code, Section 51.252. Title IX of the Education Amendments of 1972 is a federal civil rights law that prohibits discrimination on the basis of sex – including pregnancy and parental status – in educational programs and activities. The Title IX Office has developed supportive ways and compiled campus resources to support all impacted by a Title IX matter.

If you would like to speak with a case manager, who can provide support, resources, or academic accommodations, in the Title IX Office, please email:  supportandresources@austin.utexas.edu. Case managers can also provide support, resources, and accommodations for pregnant, nursing, and parenting students.

For more information about reporting options and resources, please visit: https://titleix.utexas.edu, contact the Title IX Office via email at: titleix@austin.utexas.edu, or call 512-471-0419.

Campus safety

The following are recommendations regarding emergency evacuation from the Office of Emergency Management, 512-232-2114:

University Resources

For a list of university resources that may be helpful to you as you engage with and navigate your courses and the university, see the University Resources Students Canvas page.