Hacker News new | past | comments | ask | show | jobs | submit

Brood War Korean Translations

https://blog.sourcedive.net/brood-war-korean-translations/
This was an fun read, as someone who's both a Korean BW player and a speech recognition researcher.

It's interesting to note that the original Korean transcription already has many errors, seemingly (and impressively) corrected by LLMs later on. For example, 12 안마당 빌드 (12 courtyard build) is actually 12 앞마당 빌드 (12 frontyard build), which might have been more understandable to BW players. Similarly 투에처리 빌드 (processing-at-two build? makes no sense lol) should have been transcribed 투해처리 빌드 (two-Hatchery build).

Therefore it may also be helpful to directly feed the slang dictionary into Whisper's inference process using contextual biasing. There are lots of ways to do this, but the simplest would be to increase the probability of slang words in the dictionary in the final prediction layer of Whisper by a constant factor. This is fairly easy to implement, for example by using HuggingFace's library: https://huggingface.co/docs/transformers/en/internal/generat...

loading story #42742944
Do they actually use the Korean word for, like, tossing something to refer to the Protoss? That’s a pretty funny cross-language pun if so.
loading story #42741587
loading story #42741530
{"deleted":true,"id":42741417,"parent":42741363,"time":1737137481,"type":"comment"}
loading story #42744921
Don’t let the title fool you: this is anextremely thorough and creative take on translating and making more approachable the commentary of StarCraft.

As the author rightly points out, in its 27 years of existence, commentary around the game has become a domain specific language. Not just Korean or English.

This approach of automated scripting and using AI to understand roughly what was said and then make it coherent is really cool.

loading story #42743333
Kinda funny that in an article about translations, the author gets signal-to-noise completely backwards. A high signal to noise (over 9000) is very good. It means you are getting a lot of signal with very little noise. Decreasing signal to noise means getting more noise.
loading story #42742171
loading story #42744065
If the author sees this: with yt-dlp you can download lower quality versions of videos to save bandwidth, like so:

  yt-dlp -f "bv[height<=720]" <url>
(where <url> is your URL or video ID)

That will download up to 720p quality.

loading story #42741941
Dumb question from someone who only played money-maps as a kid:

What do the numbers in front of the building mean? 12 Hatcheries seems like… well, 12 seems like a possible but implausible number of hatcheries to build (hypothetically it is possible of course). And 12 spawning pools is obviously not useful. So that makes me think it is the position in the build order list. But, they list other builds, like:

> The second is the 12 Hatch, 12 Pool, 12 Gas

Which doesn’t make a ton of sense in with that parsing. I mean it must not be a straight list. Maybe it is a tree, and 12 is the depth for this building? But that seems late, I can’t think of 11 buildings to build before gas. Maybe they include units too? Or maybe just drones/overlords?

IIRC it started with "4-pooling" which is when, as Zerg, you build a spawning pool while only having 4 workers (it's been years, forget what they're called), rebuild your 4th worker and then start making zerglings to achieve a super-early attack (a "rush").

Then your opponent calls you all sorts of vile names and questions your sexuality, etc.

loading story #42742947
It denotes how much supply you should have when you start the building. All of your supply at this stage comes from workers, so it's also an indication how many workers you should train.
loading story #42743327
loading story #42742831
Is it how many workers you have when you build that building
loading story #42743525
loading story #42741537
loading story #42741532
loading story #42741797
I loved this article, thanks for writing it.

I attempted playing a few world cyber game US regional matches and I was always amazed how much faster everyone else was. Then I remember when they live streamed it from Korea and I saw how fast they played and I was blown away. From a strategy point of view, something so basic about the game that I missed was when a blog introduced me to some math for a protoss zealot power up that defeated a zergling in 2 hits rather than 3. That's when I realized this is a chess game and I got hooked.

I get that it's "wrong" but I really like the translation of "natural expansion" to "courtyard"
I really wish someone with the resources and connections could get in touch with South Korean broadcasters in order to get access to their archives so that more historical games could get uploaded and re-commentated for a western audience.

My favorite Brood War slang term is Ee Han Timing [0]: basically when you take a risky build that has to do damage in a small timing window. A ton of exciting Brood War moments come from exploiting tiny timing windows.

[0] https://liquipedia.net/starcraft/Ee_Han_Timing

loading story #42741989
loading story #42742220
loading story #42744027
loading story #42744554
loading story #42744407
loading story #42743964
loading story #42743148
for any of the lucky 10000, like me, who were left wanting to see what this game looked like:

https://youtu.be/Nm-PXmOELAw?si=Z-RXbdqNzkSF3cqx

my brief search didn't show me any more obscure Korean only strategy videos, so maybe this one is just for the lowly foreigners :(

Reminds me, many years ago someone paid me to translate a Korean wiki article about some League of Legends pro player to English. No idea why, most of it was random trivia, it didn't contain any notable insights. But it was decent money as a side job so I didn't bother asking. Possibly similar motives to this article?

> Very few of members of the foreigner community are fluent in Korean. Foreigner access to Korean BW discourse is a contradicting concept: if you speak Korean fluently, you have no reason to be in the foreigner community, as it only has access to material that is strictly inferior and more limited. For this reason, Korean-speaking members in the foreigner community are exceedingly rare.

I can vouch for this in general - after becoming fluent I've stopped looking up anything related to Korea in English because the quality of information is much worse. I'm sure the same holds for other languages and places.

loading story #42741929
loading story #42743559
loading story #42743173
loading story #42744278
loading story #42743338
loading story #42743466
loading story #42743830
I wonder how well AI audio generation would work here, to produce a voiceover video like the original input.
Pretty cool and shows a clear issue I’ve seen across any LLM. the language and grammar is so formal/robotic
Neat. I wonder what Google Translate uses these days and if its gotten or will receive an update to a new LLM.
As the author points out, this does seem like exactly the kind of language problem that LLM‘s ought to excel at, and I love that moment of discovery when the testers were so busy discussing the content that they forgot to focus on the accuracy of the translation!
ok but where are the vods?