Regular expressions

Marek Pawelec

5 & 6 July 2021, 10:00 -16:00 CET

Course content

Regular expressions (regexes for short) are special rules designed to find text and/or numbers meeting certain criteria and to do something useful with it. An example of “something useful” may include finding all occurrences of dates in any given format (e.g. 03/24/2013) and converting them to a different format (e.g. 24.03.2013), conversion of number formats, removal of multiple spaces from the whole document in one go or joining incorrectly split paragraphs.

Regular expressions are extremely useful in processing text and numbers when preparing a text for translation, for example for cleaning up text extracted from PDF files, and in the translation process itself – one can use regexes in SDL Studio and memoQ to perform a variety of actions. And while it is usually relatively simple to create the necessary regex to match a particular text, quite often the trick is to write a regex which will match only that text, and nothing more.

The problem most people have when it comes to regexes is that they look somewhat scary and mysterious. In reality, once you know the meaning of symbols used and some basic rules, most of the time regexes are quite simple and logical. The workshop is designed to introduce regular expressions to anyone without prior knowledge and provide help and inspiration for people with basic to intermediate knowledge. We will start off from very basic up to relatively complex rules with emphasis on translation-related applications, based on real-life problems and files. After the workshop you should be able to use regexes for efficient text processing and create or modify rules to match complex text strings.

Topics covered

  • Text editing in MS Word.
  • Editing a range of text formats in Notepad++.
  • Converting text into tags.
  • Using auto-translatable elements.
  • Creating and editing segmentation rules.
  • Using regular expressions for filtering in CAT tools and Find and Replace.
  • Defining custom QA rules in CAT tools and QA software.
  • Defining filters for importing non-standard files into memoQ, SDL Studio, WordFast and open-source tools.

Participants will receive handouts with regular expression vocabulary and a detailed description of all rules created and used during the training.

Who should attend?

  • Translators who want to learn about RegEx, either from scratch or to learn additional regex options.

Event details

Date: 5 and 6 July 2021
Location: Utrecht – Address to be announced soon
Time: From 10:00 to 16:00 CET
Early-bird price: No early bird as we decided to hold the workshop less than a week before the actual date.
Regular price: €349.00 (excluding VAT, includes light lunch, tea/coffee)
Student discount: 20%
Registration: Registration for this past even is no longer possible.
Max. number of attendees: The workshop is for a maximum of 12 attendees.
Remarks: If we cannot hold it live we will reschedule to a later date approximately three months later. But as both Marek and Ellen will have had both vaccines this is not probable.