Wednesday, 25 April 2012

Computer language mystery solved by humans


Computers have languages, too. According to an article in the American Scientist, even the experts do not agree how many programming languages there are – estimates range from 2,500 to over 8,500.

One recent example which highlighted this variety was the mystery of the programming language used in the creation of “Duqu”, a computer Trojan which has been studied by heavyweight anti-virus companies like Symantec, Kaspersky Labs and F-Secure. These IT giants were able to see the code which this Trojan consisted of, but they were not able to identify which programming language had been used to compile this code.

Why didn’t they ask a computer?
To me, as a mere computer user without a programming background, the solution appears simple. It is a computer language, and a computer is obviously able to follow the instructions in the code (otherwise the Trojan would be of no use to the crooks who created it). So a computer should be able to identify what language it is. This seems to be an obvious logical conclusion.

But it is not so. Igor Soumenkov, a Kaspersky Lab Expert, wrote a blog article “The Mystery of the Duqu Framework”. The article outlines the history of the study of Duqu and the structure of the threat which it poses, and it ends with an appeal which amazed me: We would like to make an appeal to the programming community and ask anyone who recognizes the framework, toolkit or the programming language that can generate similar code constructions, to contact us or drop us a comment in this blogpost.

Digital guesswork?
Soumenkov received a flood of blog comments and e-mail responses, and the mystery of the programming language has now been solved. But it is interesting to check out the wording of the 159 comments on the original blog article. They are peppered with phrases like:
That code looks familiar
It may be a tool developed by ...
I think it's a ...
What about ...?
Just a guess ... the first thing that pops to my mind is ...
Sounds a lot like ...
I am not a specialist but I would say it could be ...
One more guess ...
This does smell to me a little bit like ...
I'm gonna take a wild guess ...
Plus a generous sprinkling of words like might, perhaps, maybe, probably, similar, clue, feel, remember, possibility and similar vague terms.

Data or brains?
For me, this throws an interesting light on the use of computers in natural language processing. The human guesswork in the comments on Duqu included many ideas that turned out to be wrong, but the brainstorming process was helpful to the computer experts involved, and the fuzzy process of human thinking led to a solution which evidently was not possible with the computer alone. And all of this for a language which is only useful in computers and has no meaning for human communication (when did you last _class_2.setup_class13)[esi]?).

The situation in translation between human languages is comparable. Automatic translation programs from Google, Microsoft, IBM and others can achieve a certain amount of pattern recognition and sometimes come up with plausible solutions. But only a competent human being can evaluate whether this solution is really accurate or appropriate. So these programs can be a useful tool in the hands of an expert, but there is a distinct risk that they may get the wrong end of the stick.

5 comments:

  1. I can't see machines taking over the jobs of human translators in the near future, as they have done with so many other professions (remember telephone operators?)
    These machine translators are ok when all u need is a quick understanding of a some rather simple text, but if you are running a business, or otherwise depend on accuracy of a translation, using professional translation services is the only way to go.

    ReplyDelete
  2. Hi brix, yes I agree on the predominance of professional human translation. In my own translation work, I often get legal texts that even educated native speakers of my source language (German) find difficult to unravel, so I shudder to think what a machine would make of them.

    ReplyDelete
  3. Hello, i would like to ask that what is the scope of C language training, what all topics should be covered and it is kinda bothering me … and has anyone studies from this course http://www.wiziq.com/course/2118-learn-how-to-program-in-c-language of programming in C ?? or tell me any other guidance...
    would really appreciate help… and Also i would like to thank for all the information you are providing on C concepts.

    ReplyDelete
  4. Hi shipra, I think your question is more appropriate on wiziq.com. The focus of my blog is on natural language and translation, my own expertise is in German to English translation. I am not a specialist in computer programming languages - in this article I simply used some examples from computer programming to make a comparison between different types of languages, i.e. historically developed "natural" languages such as German and English on the one hand, and the artificial programming languages used in IT on the other.

    ReplyDelete
  5. Asp.Net,c,c++,PHP,android, etc.

    ReplyDelete