Forum: Wordfast support
Topic: Trados v Wordfast analysis - large discrepancy
Poster: Samuel Murray
Post title: Well, there's the answer...
[quote]Clive Phillips wrote:
1. The source text file type is prep.tag.ttx...
2. I assume that the agency used TRADOS to create the TM...[/quote]
So, the file is not a WFP native file but it is a Trados native file, and it is likely that the TM (and accompanying tags in the segments) match the tagging that is used in the source text file. WFP, on the other hand, has to convert the file (and its tags) to its own format, and it re-tags the text differently. This means that a segment in WFP might be tagged in a way that is different from the way that very segment is tagged in the TM.
[b]Here is a quick test to illustrate it[/b]
I downloaded ProZ.com's main page, converted it to TTX using Trados 2007, and pre-translated it (source=target) and then cleaned up the file so that all segments were added to the TM. In other words, the TM is a 100% match for the document. The clean-up analysis shows exactly what we would expect:
[i]Total number of segments: 666
Total word count: 2962
Number of non-repeating segments: 477
Word count of non-repeating segments: 2157[/i]
Now, if WFP tagged exactly the same way as Trados, then it shold be possible to load both the TTX file and the TM into WFP, do an analysis, and get all 100% matches, right?
Alas, WFP's analysis shows this (without even calculating internal matches):
[i]Golden segments 426
Golden words 1340
Repetitions segments 36
Repetitions words 62
100% (words) 1340
95%-99% (words) 1168
85%-94% (words) 418
75%-84% (words) 5
50%-74% (words) 0
No Match (words) 36
Total segments 677
Total words 3029[/i]
Even if I deselect all penalties in WFP before running the analysis, I still get 443 words of high fuzzy matches, 55 words of medium high fuzzy matches, and 36 words of no matches. This means that the penalties don't account for all the differences.
And the penalties are there to protect you. Without the penalty, a segment like [Add 2 eggs and 300 grams of sugar] will be considered a 100% match with a segment like [Add 100 eggs and 2 grams of sugar], and a segment like [He heard a loud bang] would be considered a 100% match with a segment like [He heard a loud BANG].
[Edited at 2014-02-28 09:02 GMT]
Topic: Trados v Wordfast analysis - large discrepancy
Poster: Samuel Murray
Post title: Well, there's the answer...
[quote]Clive Phillips wrote:
1. The source text file type is prep.tag.ttx...
2. I assume that the agency used TRADOS to create the TM...[/quote]
So, the file is not a WFP native file but it is a Trados native file, and it is likely that the TM (and accompanying tags in the segments) match the tagging that is used in the source text file. WFP, on the other hand, has to convert the file (and its tags) to its own format, and it re-tags the text differently. This means that a segment in WFP might be tagged in a way that is different from the way that very segment is tagged in the TM.
[b]Here is a quick test to illustrate it[/b]
I downloaded ProZ.com's main page, converted it to TTX using Trados 2007, and pre-translated it (source=target) and then cleaned up the file so that all segments were added to the TM. In other words, the TM is a 100% match for the document. The clean-up analysis shows exactly what we would expect:
[i]Total number of segments: 666
Total word count: 2962
Number of non-repeating segments: 477
Word count of non-repeating segments: 2157[/i]
Now, if WFP tagged exactly the same way as Trados, then it shold be possible to load both the TTX file and the TM into WFP, do an analysis, and get all 100% matches, right?
Alas, WFP's analysis shows this (without even calculating internal matches):
[i]Golden segments 426
Golden words 1340
Repetitions segments 36
Repetitions words 62
100% (words) 1340
95%-99% (words) 1168
85%-94% (words) 418
75%-84% (words) 5
50%-74% (words) 0
No Match (words) 36
Total segments 677
Total words 3029[/i]
Even if I deselect all penalties in WFP before running the analysis, I still get 443 words of high fuzzy matches, 55 words of medium high fuzzy matches, and 36 words of no matches. This means that the penalties don't account for all the differences.
And the penalties are there to protect you. Without the penalty, a segment like [Add 2 eggs and 300 grams of sugar] will be considered a 100% match with a segment like [Add 100 eggs and 2 grams of sugar], and a segment like [He heard a loud bang] would be considered a 100% match with a segment like [He heard a loud BANG].
[Edited at 2014-02-28 09:02 GMT]