Description
Text algorithms are essential in many areas of science and information processing. Very often the information is represented as a written text in the form of, e.g., newspapers, books, computer hard-drives, optical disks, the genetic information of living organisms, etc. Specific text algorithms are needed when processing the information, whether in text editors or in (web)-searching engines. The intricacies of text algorithms represents a major topic of investigation in computer science. This course will present several advanced, practically relevant text algorithms and their interconnections. They include pattern matching and data compression algorithms, detection of periodicities, repetitions and symmetries in texts, and other algorithms.
Content
- Pattern-matching algorithms: Knuth-Morris-Pratt algorithm, Boyer-Moore algorithm
- String alignment algorithms: edit distance, optimal alignment, longest common subsequence, alignments with gaps
- Approximate patter-matching algorithms
- Suffix trees and text search applications
- Symmetries and repetitions in texts
- Constant-space algorithms
- Text compression algorithms
Prerequisites
- Algorithmics
- Data structures
- Note:
- the prerequisites are preferable, though not compulsory. The course is self-contained, all the background will be introduced
Course format
- Lectures: 28h, 14 lectures (including several exercise sessions)
- Final examination:
- 30 points (without bonus from exercises) – pass, mark 1/5
- over 27 points -, mark 5/5
- Bonus points
- a maximum of 5 bonus points in the exam can be gained from completing 80% of the exercises
- 1 bonus point can be gained for completing the custom-made “feedback form” (to be electronically filled, printed, and collected “on paper” only on 18- and 20.12.2018)
Literature
- Maxime Crochemore, Wojciech Rytter, Jewels of stringology, World Scientific, 2002.
- Maxime Crochemore, Christophe Hancart, Thierry Lecroq, Algorithms on strings, Cambridge University Press, 2001.
- Dan Gusfield, Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1999.
Credits
- 5sp
Time schedule
- Start date: 30st of October, 2018
- End date: 18th of December, 2018
- Tuesdays:
- 15-17, 115A, Agora
- Thursdays:
- 10-12, 115A, Agora
- Course exam:
- 20th of December, 2018
- ( Dept.’s Exam date in January, )
Lecturer
- Eugen Czeizler,
- Faculty of Science and Engineering,
- Åbo Akademi University,
- firstname.lastname at abo.fi
Lecture Slides
(preliminary, to be updated during the course)
- Lecture_1: Preliminaries
- Lecture_2: Pattern matching algorithms. KMP Algorithm
- Lecture_3: Pattern matching algorithms. Boyer-Moore Algorithm
- Lecture_4: Suffix trees
- Lecture_5: Suffix trees(2) and applications
- Lecture_6: Alignments
- Lecture_7: Approximate pattern matching, local alignments and alignments with gaps
- Lecture_8: Multiple alignments
- Lecture_9: Database searches (Bonus Topic)
- Lecture_10: Detecting text regularities
- Lecture_11: Compression algorithms (Bonus Topic)
Exercises
Exercises_1: Exercise Set 1 (due 8.11.2018)
Exercises_2 Exercise Set 2 (due 27.11.2018)
Exercises_3 Exercise Set 3 (due 18.12.2018)
Notes:
NOTE 1: Tuesday, 11.12.2018, there will be no lecture/exercises.
NOTE 2: For acquiring 1/30 points in the exam fill in and submit the anonymous feedback form bellow. The forms will be collected on 18.12.2018, and during any of the exam days for this course. Make sure that the exam supervisor marks you for submitting the form (he might not know about it).
NOTE3: The exams are “open book”. Students are allowed to have lecture notes or other material for the course, both on paper or electronic, i.e., use of computers is allowed for this exam
NOTE4: The Bonus Topics (i.e., Lecture 9 and Lecture 11) are not covered in the exam.
Exams:
First exam day: 20.12.2018, from 10 am to 1 pm, room 115A (Agora Building)
Second exam day: 11.01.2019