Wednesday, 25 April 2012
Friday, 2 March 2012
Is this another nail in the coffin of freelance translation as a career?
A recent article on the blog of the Translation Automation User Society (TAUS) does not hold out much hope for specialist translators. The title of the article is “Who gets paid for translation in 2020?”. I would love to quote the author of this article by name, but no name is given. Perhaps this is a model article, generated by a computer, untouched by human hand. This would graphically illustrate the creed which underlies the article:
“In 2020 words are ‘free’. Almost every word has already been translated before. Our words will be stored somewhere and used again, legitimately in the eyes of the law or not. .... Even today ‘robots’ are crawling websites to retrieve billions of words that help to train machine translation engines. The latent demand for translation created by unprecedented globalization is making piracy an act of common sense.”
The TAUS vision paints a glowing picture of a completely automated future, with instant computerised translation in every hand-held device, every computer application and on every website, without any need for specialist intervention. To achieve this, TAUS aims to build up a database of all the translation work done in the world. It seems to envisage three methods to do this:
BEG, SCAVENGE and STEAL
BEG: In conference lectures, blog articles and other publications, TAUS calls on translators to donate their translations to its central database. The reward for doing this is to know that we are contributing to the BRAVE NEW WORLD of global computerised translation. There may be some payback in the form of access to databases provided by others, but the rhetoric of the begging prose is that we should contribute for free to the ideal of a humanity without language barriers.
SCAVENGE: The above quote speaks of the “robots” which are retrieving billions of translated words to train machine translation engines. But a scavenger takes everything that it can find. A scavenger cannot afford to be fussy about quality. There are two experts in the industry who have important things to say about this. First of all Kirti Vashee in his blog eMpTy Pages. Kirti is an ardent advocate of machine translation, but he insists that the data used to train the translation engines must be of extremely high quality. The danger of the TAUS vision of innumerable robots scavenging for more and more data is that this can include lots of low quality data, so the resulting translations will be inherently problematical. The other expert is Miguel Llorens, a highly insightful freelance translator who ridicules many of the assumptions of the machine translation gurus and elegantly criticises buzzwords such as the “content tsunami” and “crowdsourcing”.
As an aside: Kirti and Miguel disagree on many things - I suppose it is not often that they are recommended as two leading experts in the debate on machine translation.
STEAL: It has often been suggested that Internet giants such as Google and Facebook are in fact data-gobbling monsters which think nothing of violating data protection standards. But at least in their public statements, they usually claim to respect the privacy of their users and to comply with data protection laws. Not so TAUS. In the above quotation, TAUS explicitly suggests that piracy is “an act of common sense”. I wonder if the similarity to the confiscation of private assets in the ideology of Marx, Stalin and others is merely accidental. Brave new world indeed!
Translation and my grandchildren
By the time the brave new world predicted by TAUS comes to pass (2020), my own translation career will be drawing to a close, or perhaps already ended. But what about my wonderful grandchildren? They will be on the threshold of their working lives (and some will be still in primary school). What should I tell them if they ask about translation as a career?
I will say: “Why not - if that is what you are really good at.” Of course I will point out the general principles of working in a career like translation: real language expertise in two languages, realistic self-appraisal and self-management, translating skills, the need for solid specialisation, how to use the tools of the trade (including computer-aided translation and various forms of machine translation), how to advertise and find customers and much more.
This is because essentially I do not accept the TAUS creed that “Almost every word has already been translated before.”. Even at the word level, in my work I regularly come across newly created terms or compound words (German legal and architectural prose has an amazing level of inventiveness in this respect). And at the sentence level, every language on earth has an incredible potential for creative new combinations of ideas and even new linguistic structures - after all, I believe that we are still building the tower and city of Babel.
Tuesday, 17 January 2012
1. There are three types of memory:
The TM (Translation Memory), the TB (Termbase) and the lexicon for each project.
- The TM is a database where you can save the sentences from your source text together with your finished translation.
- The TB is a terminology database which you can use for single words or whole phrases.
- The lexicon is a database which only applies to the individual project. For every project file you can create a new lexicon.
2. Big Mama and Big Papa:
You can keep all of your work in just one TM (“Big Mama”) and one TB (“Big Papa”). If you are careful to give your entries the appropriate subject and client codes, DVX2 will take these codes into account when suggesting translations from your databases. My main TM contains about 40,000 sentence pairs accumulated over 12 years, and my main TB has about 55,000 entries.
3. Separate TMs and TBs:
In DVX2 Professional you can have up to 5 TMs and 5 TBs open in any project, and DVX2 Workgroup has no limitation. So you can use your Big Mama/Papa together with external databases, e.g. a TM or terminology list provided by the client, general reference material such as the EU DGT database, or terminology lists from major enterprises such as Microsoft, SAP or from various banks. Or you may even decide to keep separate databases for different subjects or clients instead of a Big Mama or Big Papa. You may feel that this is safer if you work on texts for competing engineering or IT firms which deliberately use different terminology for their own brands. The problem is that it may be more difficult to access all of your reference material, for example if you know that you have dealt with a term or sentence in DVX2, but you can’t remember which database you were using at the time.
4. Fuzzy matching:
You can allow DVX2 to find matching material which is not quite exact. Under Tools>Options>General you can set a percentage figure for the variants which DVX2 is allowed to find (= “Minimum Score”). The default setting is 75%, but depending on the type of inflections which occur in your languages it may be useful to set it to 50% or less. The percentage applies to both the TM and TB. It does not apply to the lexicon – only exact matches are found in the lexicon. And the “minimum score” does not affect the performance of the DVX2 functions DeepMiner and AutoWrite.
5. Adding new entries:
This is very quick and easy in DVX2. For the TM you enable AutoSend (either with the tick box at Tools>Options>Environment, or via the icons at the bottom of the DVX2 window – AutoSend is the second icon from the right). Then all you need to do is click CTRL-DownArrow when you have finished each segment. For the lexicon you have to highlight the word or phrase in the source and target text, then hit the F10 key. For the TB you again highlight the word or phrase in the source and target text, then hit F11. This brings up the following window:
Here you can edit the term in either language to add or remove declensions, correct spelling problems etc. You can check that the terms are marked with the right subject and client codes. There are additional fields, too (Definition, Part of Speech, Gender, Number, and you may also see a field called Context). I have not yet seen any reason to use any of these fields, although some users may have found ways to do so.
The termbase (TB) is one of the keys to productivity in DVX2. It is advisable to add words, and even whole phrases, as often as you can. Some users have the principle of adding an entry to the TB in every single sentence they translate. Steven Marzuola’s article about using the terminology database was based on the previous version of DVX (now often called DVX1), but it offers great advice which is also relevant to DVX2.
6. Subject and client codes:
These are important, because DVX2 refers to them when it decides what material to offer to help you with your current translation. When you first install DVX2, you will see a suggested list of subjects, but you can easily delete this and create your own list if you think this is better for your work. Each subject consists of a short index code (435 in my example above) and a descriptive text (Regional planning/ecology). When DVX2 decides how close the subject is to your current project, it works hierarchically, so in this example it would consider that entries with my subject codes 43 (Urban planning) and 4 (Building) are closely related. You can use letters instead of numbers if this suits your work.
7. Build lexicon:
This is a function which you can find in the “Lexicon” menu, and which is sometimes useful in preparation for a job which is heavy on terminology. I use this function for between 5% and 10% of my jobs. My procedure is as follows. First I call up “Build lexicon” and define the maximum number of words (usually 4). The program then takes a couple of minutes to find solutions. Then I open the lexicon (with the Project Explorer), click on the heading over the left hand column and define the sort criteria: 1. Number of words (descending), 2. Frequency (descending). Then I go through the list manually from the top. First I decide which four-word phrases are worth adding a lexicon entry for. This is usually only worthwhile for phrases which are meaningful in themselves and which occur frequently. When I get down to phrases which appear three times or less, I then use the scroll bar to move down to the most frequent three-word phrases. And so on, until I have defined a number of lexicon entries. Then I select “Remove entries” from the Lexicon menu, click on “Entries with empty targets” and OK. Typically, this gives me between 30 and 50 lexicon entries for a job consisting of several hundred segments, but they are entries which occur frequently and require consistency, so this preliminary process improves the results achieved by Pretranslate or Assemble as I work on the job.
This function (Build lexicon) can also be used to identify terms that can be used for a terminology list to be delivered to the client if this is part of the client’s instructions for the job. Over the years I have only had one such project, but this may be relevant for translators who often work in highly technical fields.
8. Names, places and proprietary titles:
These are the classic elements which should be added to the lexicon. If you have a product name or number, this is normally only relevant to the job in hand. You do not usually want this term to occur in jobs for other clients. The same applies to the names of the people who work for the client. Therefore, such elements should only be sent to the lexicon, and not to the termbase. But some names occur so often that they may be useful in the TB. My general principle here: if names could be confused with actual words in the language, they are not suitable for the TB. So the common German name Helmut is not in my TB because, depending on the level of fuzzy matching, it could be confused with the word Helm=helmet (and the declined forms Helme/Helmen/Helmes). Similarly, the surname Kohl is not in the TB to avoid confusion with Kohl=cabbage (and the near-match Kohle=coal). But the two names together are in the TB – i.e. the former German Chancellor Helmut Kohl. And other famous politicians are there too with the spelling in German and English, such as Gorbatschow/Gorbachev.
9. Adapting your use of the databases to your languages:
In some cases, your language pair and translation direction will influence the way you use the different databases because of issues such as word order and inflection. One example of this is the English phrase “public green spaces”. In French the words come in a different order, e.g. “espaces verts publics”, and alternative wordings are possible, e.g. “espaces verts des lieux publics”, “espace verts ouverts au public”, “espaces verts pour le public” etc. (Thanks to Dave Turner for providing these and other examples). In German the first translation that comes to mind is “öffentliche Grünflächen”, although the first word could also be declined as “öffentlichen”.
If you are translating from French to English, you will probably want to enter each and every French phrase as a lexical unit, especially if it occurs frequently in the type of text you deal with. Merely entering the elements does not help very much, because the order of the words must be changed. Depending on your type of work and the frequency of such phrases, you may decide to store them in the lexicon, the TB or the TM.
If you are translating from German, in this case it is sufficient to add the two words to the termbase and let DVX2 handle the endings as “fuzzy matches”. Even if we consider phrases with a greater number of inflected variants such as “public building”, (“öffentliche Gebäude”, “öffentliches Gebäude”, “öffentlichen Gebäudes”, “öffentlichem Gebäude”), it is still possible to enter just one version of each word and use fuzzy matching. The advantage here is that although the German source is inflected, the English target phrase is not.
Translating from a largely uninflected language into inflected languages like French and German can be more complicated, so you will have to find a strategy which fits the languages that you work with. There is no single solution which will work for all languages and all subject areas, but DVX2 offers flexibility in the use of the databases.
10. Looking things up in the database:
There are various ways to access the information that is in your databases. The first is that DVX2 uses this information to compile its suggested translation (when you use the functions “Pretranslate”, “Assemble” or “AutoAssemble”). When you have done that, you will see that some words or phrases in the suggested translation are underlined in blue. These are terms for which your databases contain several possibilities. Right clicking on the word or phrase will show you the other suggestions, and you can examine these and select them with the mouse or by using the number shown. The third way to see the relevant content of your database is by looking at the “Portions” window or windows. There are several screenshots illustrating this here. The fourth way to look up the information is to use Scan (CTRL-S) to call up a concordance from the TM, or Lookup (CTRL-L) to see entries from the TB.
11. Moving databases to another computer:
If you need to move your work to a different computer, e.g. to work on a laptop while you are travelling, you will need to copy certain files to the other computer. The first file is your project file, which has the extension .dvprj. The project file contains the lexicon, so no special steps are needed to transfer the lexicon. The termbase is a single file with the extension .dvtdb. The TM consists of at least four files. The main content is in a file with the extension .dvmdb. Then there is an index file for each of your languages; my index files have the extension en.dvmdi and de.dvmdi (for English and German). There is also a file with the extension .dvmdx. When you open the project on the other computer, DVX2 may complain that it cannot find the databases. But this is not a problem – when the project is open, you can select them with Project>Properties>Databases.
Another file which is worth moving to the other computer is the settings file with the extension .dvset. This contains your subject and client lists and various other settings. And don’t forget your dongle, or if you use an electronic licence key, make sure that the key will apply to the other computer.
12. How to find out more:
For more detailed information it is worth looking at the DVX2 User Guide for DVX2 Professional or DVX2 Workgroup. The link is at the bottom of the page, and the user guides are PDF files with over 600 pages. On the website http://www.atril.com there are also links to various videos, webinars and training courses, and also to the mailing list dejavu-l (under Support>Technical forum).
I already mentioned Steven Marzuola’s article on terminology databases. It is also worth looking at Nelson Laterman’s collection of tips and tricks for DVX1 (and even its predecessor DV3).
I am sure there are plenty of tips and questions which I have not covered, so I am looking forward to reading comments by my readers.