Date: Saturday, November 08, 2025 10:00 AM - Sunday, November 16, 2025 1:00 PM
Location: Online Live
Dr. Mark Davies

Using Large Online Corpora for Research, Teaching, and Learning

ENES 8656: SEMINAR 2

Professor:  Dr. Mark Davies (Professor (retired) of Brigham Young University, U.S.A.)

Credit hours:  1 credit hour

Schedule:  3-hour online Zoom sessions for four days

  • Saturday, November 8 from 10:00 to 13:00 (JST)
  • Sunday, November 9 from 10:00 to 13:00
  • Saturday, November 15 from 10:00 to 13:00
  • Sunday, November 16 from 10:00 to 13:00

Students taking this seminar for credit must attend all four days. 

This seminar will be conducted by 3-hour online Zoom sessions for four days: Saturday, November 8, Sunday, November 9, Saturday, November 15 and Sunday, November 16 from 10:00 to 13:00 (JST). Students taking this seminar for credit must attend all four days. Students can add/drop this seminar course by 14:00 on Saturday, November 8.

The pre sign-up (or course registration for those who are taking this seminar for credit) is required for anybody attending the public session on Saturday, November 8 from 10:00 to 13:00. The sign-up process must be completed through the "Distinguished Lecturer Series Seminar Sign-Up Form" that is available on TUJ Grad Ed website. The sign-up deadline is Friday, November 7, at 12:00 p.m.The public session Zoom link will be provided to those people who completed the online sign-up (or course registration) process by 18:00 on Friday, November 7. 

This series of seminars will examine the many ways in which corpora can be used to enhance research, teaching, and learning. The seminars will be based primarily on the corpora from English-Corpora.org, which are perhaps the most widely-used corpora currently available. In the seminars, we will consider the following topics (among others):

  • Basic corpus linguistics methodologies such as concordances (to examine the patterns in which words occur), collocates (to examine the meaning and usage of words and phrases), and n-grams (highly frequent strings of words). We will also focus on how this data can be used to improve teaching and learning.
  • Insights from corpora into word frequency (including variation by genre, dialect, and time period), and how this frequency data can be used in teaching and learning.
  • Keywords and “virtual corpora”, to focus on the vocabulary of particular domains (e.g. engineering, economics, or sports).
  • Insights into English grammar (again, including variation by genre, dialect, and time period), similar to what Biber et al (1999) have done with the Longman Grammar of English.
  • Using new features (which were released in Summer 2025) to gain insight from AI/LLMs to analyze and classify the corpus data.