building a tree from a flat definition (with a stack)
Something I find myself doing often is trying to build a hierarchical data structure (tree) from a flat definition of some kind. An example is that I have a simple formatting language whose parser outputs to HTML, and it can handle nested lists. The list is just a flat piece of text, but to convert it into valid HTML, it’s easiest to convert it into a tree and recurse down it. This is one of those situations where the right choice of data structure makes the resulting code a lot cleaner and more direct. I’ve had to do it in a bunch of other situations too, so it seems to be a semi-common task and it’s not immediately obvious how best to do it (unless I missed that class).
The way which seems to work best is to use a stack. Iterate through each of your elements in your definition and create a node for each one. The node needs to keep a list of children (and whatever other information is relevant). It doesn’t need to keep track of its parent. As you’re going through, put each node on the stack. When the node is complete*, pop it and append it to the children of the node which is now on the top of the stack. The last node you pop before the stack is empty should be the root node of your tree. Simples. You’ve now got a nice tree you can recurse down or whatever.
Notes:
- It can be easier to put a dummy root node on the stack first. This is so that you don’t have to worry about popping the root node and it going out of scope.
- At the end, you may find you still have a bunch of nodes still on the stack. Obviously you have to pop these and append them as children before the process is complete.
- At any one iteration, your data definition might be such that you need to pop more than one node.
*Ascertaining when a node is complete depends how your data is laid out, but let’s say you’re using Pythonesque left-leading indentation, a node is complete as soon as the indentation level of the element you’re looking at drops below the indentation level that corresponds to your node.
GeSHi 1.2
I checked out the SVN branch of GeSHi 1.2 today. I looked at GeSHi very closely about a year ago and was less than impressed. A lot of people seem to rate it, and to be honest, this mystifies me. I ended up writing my own syntax highlighter, which is better than GeSHi, I think.
Anyway, I looked at 1.2 and it’s better than it was, but I’m still not overwhelmed. As far as I can tell, it has been re-written to emulate a state machine, which is definitely an improvement, but the code seems convoluted and it still doesn’t seem that powerful. I gave it some JavaScript and it was easy to make it confused on regex literals. JavaScript’s disambiguation for the ‘/’ symbol is very simple and well defined, but if your lexer is not powerful/extensible/explicit enough, it will not be able to check the condition and you’ll have to resort to hackish regular expression look-behinds (which won’t always work).
I looked at the code and to be honest I don’t really understand it, but it looks like it’s still using one central lexer to handle everything. When you aim to recognise more than a few languages, it seems ridiculous not to try to abstract and equate them, but that’s exactly what I didn’t do and I think it was the best choice I could have made. That’s the thing with abstraction – it can make life easier or it can make life harder. At worst it can be a total pain and and be a problem in its own right, but more often, it just limits you. The key is in choosing what to abstract, and the best answer is not always the most obvious. It pays to be clever in your design so you can write straight-forward code; the minute your code has to be ‘clever’ is the minute you need a new design.
TEAM FORTRESS 2
AS SHE IS PLAYED
YES. Valve released TF2 for free last week and I got around to trying it out on Monday. It’s pretty cool. The 6GB download took all day but it was worth it.
I used to play Team Fortress Classic YEARS ago. I played it a lot. I remember some screenshots of a development version of TF2 from around that time, it was done in the HL1 engine and looked nothing like it does now, and I was really excited by it, because, at the time, online gaming via the ancient 56K modem was a kind of magical alternate reality full of two dimensional explosions and people making crude remarks about your mother. But then I stopped playing it because of, you know, Counter-Strike. So I’ll admit I was never amazingly enthusiastic about TF2 by the time it eventually showed up because it seemed like something I’d already done and got over.
I WAS WRONG. I could go on about it for hours and explain the minute sections of the game, but I’ll cut all that out and just say it’s fun. Very fun. Hilarious fun. Light hearted, silly, and nonsensical fun. Unlike TF or TFC, which were supposedly serious war games (well, so they thought), it has camp pseudo-60s evil genius feel to it which is absolutely endearing. Even my sister is playing it and she has never played a multiplayer FPS before.
It’s also quite newbie friendly. There are a lot of classes but you don’t have to choose a very active one to begin with. The best beginner class is probably medic, you run around like crazy trying to heal people and don’t worry too much about attacking anyone. The strongest class is heavy, he is very powerful but very slow and draws a lot of fire, so if you find one, stick to him and keep him healed. Once your medic gun charges up (bottom right), you can make someone invincible by right clicking while your beam is on them (and keeping it on them). Again, unless there’s an emergency, you should wait for a heavy near the front line then ubercharge him and hope he rushes in and kills ALL THE THINGS. You get credit if someone kills when you’re healing them (kill assist) so it’s all good. If all else fails, ignore the needle gun whose needles are too slow moving to be useful most of the time, instead get out your melee weapon and go insane. I have found that most people don’t prepare for the case of a medic rushing at them with a bonesaw. Unfortunately, some do.
The downside is that Steam is annoying and there are soooooooo many updates for TF2. I trust that adding extra letters makes up for the missing “that” statement. I only downloaded it on Monday and I swear it’s had to download updates no fewer than four times since then. Also the item system is clearly a Skinner box; you get items which can alter your loadout for each class (different guns, melee weapons, hats…), and you can either buy the items, or get them randomly as you play. Obviously, the upshot of this is that people probably feel compelled to play past the point of fun in the hope of getting a new item soon. Games shouldn’t do this. It is evil.
The free version has some slight limitations in terms of what items you can get and how many you can hold (50 instead of 300, I think). To upgrade to the premium version it costs £4 (that’s because Steam won’t let you put any less into a ‘steam wallet’, and then you need to buy an item from the shop. The cheapest is currently £0.29. So you can spend the other £3.71 on some indie game or something). I upgraded my account because the game is worth £0.29 or £4, however you look at it.
php performance woes
I found a fairly strange case of php performance scaling today. I’m sorry, but I’m not going to try to separate this out of a several thousand line codebase to reproduce the exact conditions it’s happening, so we’ll just look at the snippet that directly affects the performance.
The function at fault is building a tree via a stack, the algorithm probably has a name but I don’t know it. It’s not important. We’re inserting some text into the tree as a child of current node, and we’re trying to avoid adjacent separate strings by concatenating them. As it happens, the concatenation here was later rendered redundant.
The function looks like this:
<?php
$children = & $this->node_stack[count($this->node_stack)-1]['children'];
if (!empty($children) && is_string($children[count($children)-1])
$children[count($children)-1] .= $str;
else
$children[] = $str;
you notice what about this code? It’s not entirely optimal, correct! But unless you’re a micro-optimisation nazi, you’ve probably misjudged the real impact it has. With this code, a 30KB input runs in about 0.55 seconds and a 60KB input (the first input doubled up) runs in 3.5 seconds. A 200KB input triggers a 60 second timeout. Err, clearly that’s not very good..
If you remove lines 2, 3 and 4, (these are redundant, remember) everything runs perfectly linearly: 30KB runs in 0.3 seconds, 60KB runs in 0.6 seconds and 200KB runs in about 2.0 seconds.
The problematic part turns out to be the call to count($children). The two following functions have 1) linear and 2) exponential(ish — didn’t plot a graph) runtime
<?php
// linear
function fast($str) {
$children = & $this->node_stack[count($this->node_stack)-1]['children'];
$children[] = $str;
}
// exponential (probably)
function slow($str) {
$children = & $this->node_stack[count($this->node_stack)-1]['children'];
$children[] = $str;
count($children);
}
This *really* shouldn’t happen. Everyone knows that count is O(1). And we’re also calling count on the node stack with no adverse effect.
Here’s the bizarre thing, as generated by the profiler on the same input:
fast: called 7924 times, 2.59% of program runtime
count (as callee of fast): called 7924 times, 5.23% of fast()
slow: called 7924 times, 42.3 % of program runtime
count (as callee of slow): called 15848 times, 0.55% of slow()
yeah! Read that again: count takes up a lot less relative time in slow() even though it’s called twice as many times, even though its presence magically causes the calling function to slow down massively. According to the profiler, the absolute time taken for count() is about 3x as much slow() vs fast(), we expect it to be 2x, but 3x is probably within error bounds.
It’s safe to say I haven’t got a clue what’s going on here.
MYSTERY.
this is php 5.3 and xampp on Ubuntu x64. I’ll find out exactly which version of php if anyone cares.
err edit, this seems to be it: http://bugs.php.net/bug.php?id=34540… OMG PHP is retarded sometimes. “Thank you for taking the time to write to us, but this is not
a bug.” umm yes it is.
bamby002a@yahoo.com, you suck
My webserver got a strange looking request earlier. In fact, it got two identical ones within 10 minutes of each other. One came from America, one came from Moldova, but upon further inspection, both are running web servers, and the user agent in both is ‘libwww-perl/5.805′, so they likely fell prey to the same attack and are now being used to propagate it to other sites. Turns out it was trying to perform a remote include exploit, so naturally, I had a look at exactly what it was trying to include. It tries to pull in a php script which implements a file upload and shell interface (via system()), which, if successful would give an attacker a fair amount of control over the server. Upon successfully including itself, it would email somebody a little message with the server details. The author was a little sensitive about this and he hid that code behind a base64 string and evaled() it. The email address was hidden inside that as yet another base64 string (as if I’m going to decode the first one then suddenly forget how to do it?), and it was:
bamby002a@yahoo.com
If you Google it, it belongs to an Indoneasian called Bambang Ariyanto, or BaMbY. Bamby’s not really the kind of smooth operator you’d expect such a leet haxor to be. You’d expect him to be using a throwaway email address and he’d connect to it a couple of times a week through Tor or something. If he’s sensitive enough to double-base64 encode identity information, you’d think he’d be keeping a low profile. Not so. Instead, he’s using what appears to be his personal email that he has registered elsewhere. He’s got social networking profiles and photos all over the place. About 10 minutes after I saw my server logs, I’d found his full name, location and several photos of him. In fact, you could say he’s a bit of an eejit.
I know what you’re thinking; maybe he just had his email hacked, and it’s not really him? It seems pretty unlikely. On his http://buahkata.blogspot.com/ blog, he has talked about free software and ReactOS [if you use Chrome it'll translate it for you]. On his http://b4mby.multiply.com/ Multiply he has a link to http://www.securityfocus.com/, a site listing recent security vulnerabilities in software. We also have a posting here by someone called BaBmY, linking to a geocities (yahoo) account called ‘bamby002a’ (check the email address), in which he’s sharing an extraordinarily ineffiecient md5 brute forcer written in Perl, even though you could put together a much faster implementation in C in fewer lines.
Here’s his website: http://www.bamby.web.id/
there’s nothing there, but feel free to click it so he sees this in the referrer.
Bamby Ariyanto: you sir, are an idiot.
regular expressions: the cause of and solution to, all of life’s problems.
uhum, a few years ago I wrote a regular expression to try to approximate the number of syllables in a word.
Today I dug it out and decided it was not good enough! Instead of approximating the number of syllables, let’s try to make it match the full syllables!
The result follows. It’s in Python re module syntax, which is pretty much PCRE but you’ll notice python lets you do non-fixed length assertions (huzzah), which PCRE does not (boo). Also I apologise for some bad constructs but I wrote it several years ago. All {0,1}s should be ‘?’ and all x|y should be [xy], but when your regexes get this big you get wary of changing them.
You have to split your input by re.split(‘[^a-z]‘) because the regex operates at the word level. Then you do an re.findall() with the re.X/re.VERBOSE flag set.
(?:
[bcdfghjklmnpqrstvwxz]*
(?:
[aeiou]+[^aeiou]e(?=[aeious])
|
#Easy vowels first, each may be repeated up to 3 times.
(?:
#i may not be part of ing, eg 'trying' is two syllables but here would be counted as one.
# ing is addressed below.
# likewise ia iod ua are generally pronounced as two syllables (deviation, period, usual).
# These are excluded here to remove from the repetition qualifier.
a|i(?!:ng|a|o)|o|u(?!a|se)|y
|oe
#e is not so easy, and to avoid the repeatition qualifier it is also addressed underneath.
# match e if not on the end of a word like 'note', or plural forms, like 'notes'.
| (?:e(?!(?:s){0,1}$) )
){1,3}
# now we address the possibility of 'e' being on the end of a word and does qualify for a syllable,
# for example 'maybe', or 'couple'
#Do match e at the end if preceding letters were not such that a silent e is likely:
# >1 consonants EXCEPTION: r two letters before the e, as in 'barge', 'nce' as in variance
# errrr shouldn't the last subpattern be a look AHEAD?
| (?<=[^aeiour][^aeiou])(?:e[sd])(?:[\s]|$)(?<!nce)
#and we rejected e when on the end of word or a plural, but what about voices or aces.
|ces$
#short words ending in e or a, like 'me', 'the', 'sea', etc.
| ^[a-z]{1,2}[ea]$
#excluded exceptions from earlier. Note non-consumed following letters to avoid overlapping.
| ing | i(?=a)
| i(?=o) | u(?=a)(?!are)
)
(?:
(?:
# bah this needs refactoring
bb|dd|ff|gg|ll|mm|nn|ss|tt
|ch|sh|ght?|lt|th|rn|st|ft|nd|rt|rd|gn$|nt|ct|(?<=[aeiou])xp|lf
|ng|ck|rst|lp|ld|rp
|
[bcdfghjklmnpqrstvwxz]?
)?
(?:s$)?
)?
)
|.+
I ran it on an excerpt of a Pirates Of the Caribbean article from Wikipedia and it split the words as follows:
unintelligible ['un', 'int', 'ell', 'ig', 'ib', 'le'] relinquishes ['rel', 'in', 'quish', 'es'] intelligence ['int', 'ell', 'ig', 'en', 'ce'] negotiations ['neg', 'ot', 'i', 'at', 'i', 'ons'] interceptor ['int', 'er', 'cep', 'tor'] encouraging ['en', 'cour', 'ag', 'ing'] intelligent ['int', 'ell', 'ig', 'ent'] calculating ['cal', 'cul', 'at', 'ing'] vocabulary ['voc', 'ab', 'ul', 'ar', 'y'] lieutenant ['lieut', 'en', 'ant'] previously ['prev', 'i', 'ous', 'ly'] deciphered ['dec', 'ip', 'her', 'ed'] acquiesces ['ac', 'quies', 'ces'] norrington ['nor', 'ring', 'ton'] desiderata ['des', 'id', 'er', 'at', 'a'] admiration ['ad', 'mir', 'at', 'i', 'on'] strategies ['strat', 'eg', 'i', 'es'] intentions ['int', 'ent', 'i', 'ons'] proclaimed ['proc', 'laim', 'ed'] grappling ['grap', 'pling'] elizabeth ['el', 'iz', 'ab', 'eth'] evidenced ['ev', 'id', 'en', 'ced'] advantage ['ad', 'vant', 'ag', 'e'] returning ['ret', 'urn', 'ing'] seemingly ['seem', 'ing', 'ly'] commodore ['comm', 'od', 'or', 'e'] actuality ['act', 'u', 'al', 'it', 'y'] consensus ['con', 'sen', 'sus'] announces ['ann', 'oun', 'ces'] swordsman ['sword', 'sman'] suggested ['sugg', 'est', 'ed'] resulting ['res', 'ult', 'ing'] determine ['det', 'er', 'min', 'e'] indicated ['ind', 'ic', 'at', 'ed'] persuades ['per', 'suades'] murderous ['murd', 'er', 'ous'] negotiate ['neg', 'ot', 'i', 'at', 'e'] extremely ['ex', 'trem', 'el', 'y'] mutinied ['mut', 'in', 'ied'] stabbing ['stabb', 'ing'] explains ['exp', 'lains'] attitude ['att', 'it', 'ud', 'e'] immortal ['imm', 'ort', 'al'] contrast ['cont', 'rast'] although ['alt', 'hough']
Not bad?
IPs as evidence
I am reading a lot of things to do with file sharing and flawed evidence lately. One of the common misconceptions is that IPs can be arbitrarily spoofed. This isn’t really true. You can spoof your IP but you won’t receive any packets, so it’s kind of a pointless exercise unless you’re doing something cunning. It’s not practical for web usage. It’s very hard to do it most of the time because the data transfer protocol you use will most likely be built on TCP which requires a forwards-backwards handshaking procedure to establish a connection first, and it’s hard to bluff your way through the handshake if you don’t know what the other person’s sending back. Or as nmap says: TCP Sequence Prediction: Difficulty=204 (Good luck!).
But I think this is all missing the point a bit. The IP in the first place is such flimsy and meaningless evidence. There are too many humans in the chain who stand to gain from just making them up and it’s trivial to do so. Between a law firm and a ‘monitoring’ firm, you’ve got two separate entities whose entire business depends on being able to come up with a set number of IP addresses per month. It isn’t a difficult task to add on an extra 10% to ensure you get a nice bonus. It isn’t even a difficult task to fictitiously generate 95% of your data.
If I was doing this, my first thoughts would be like so: I’d probably proceed by recompiling a torrent client to write a list of IPs for each torrent to a text file (filtering by UK IPs belonging to ISPs who don’t protect their customers), then if I was a greedy cretin I’d most likely bump up the number like so:
#!/usr/bin/env python
import commands
import random
import re
ip32 = lambda n: sum([int(x)<<8*(3-i) for i, x in enumerate( (n.split(".")) ) ])
def random_ip_from_subnet(network, subnet):
ip = 0
for i in range(32)[::-1]:
snbit = (subnet >> i ) & 1
ip = ip << 1
ip |= (network >> i) & 1 if snbit else random.randint(0, 1)
return ip
users = [] # read user list in here as list of (ip, date) tuples
newusers = []
for ip, date in users:
newusers += [(ip, date)]
subnet = commands.getoutput("whois {0} | grep route | awk {{'print $2 '}}".format(ip)).trim()
if re.match("^(\d{,3}.){3}\d{,3}/\d+$", subnet):
[network, mask] = subnet.split("/")
networkip = ip32(network)
maskip = 2**(32-int(mask))-1
for i in xrange(3):
fakeip = random_ip_from_subnet(networkip, maskip)
fakedate = date + (random.random() - 0.5) * (60*60*24*28)
newusers += [(fakeip, fakedate)]
# write newusers to file here.
(note: untested code)
And there we go, I’ve just increased my income by 300% in about 30 lines of code. Each IP I generate is on the same ISP as one of the ones I’ve really captured, so I know that I’m not going to have to submit multiple court orders. Some of these will likely not be allocated at the time of the alleged infringement, so the ISP won’t be able to identify them to a user, but IPv4 is oversubscribed so we’re not just taking wild stabs in the dark. A lot of these will be allocated. I’ve looked through ACS:Law’s spreadsheets and on one of the BSkyB ones, 1387 out of 5699 records are ‘unknown’. On another 1325 out of 5358 are unknown. What’s going on here? It doesn’t say why they’re unknown, but it’s a massive failure rate.
The basic idea of this code is that it takes a known IP, it then looks up that IP’s WHOIS record which tells us what range (subnet) it’s on. Then we start generating IPs at random from that range. We also randomise the date within a 14 day range (although if I was making a serious effort, I’d then truncate the hours/minutes/seconds and instead choose time of day randomly from a non-linear distribution to favour evenings during the week (negatively skewed normal distribution, mean 8pm) and daytime at weekends — this would take an extra 10 lines or so but I’d have to read up on both the random and date modules so I didn’t bother). Also if I was making a serious effort I’d drop or regenerate IPs whose last byte is 0 or 0xFF.
When someone says “I’ve got an IP address” the correct response is not “alright here’s a court order to get the ISP to tell you who it belongs to so you can send them a letter demanding £500″; the correct response is “so?”.
gridverse
http://twitter.com/WCGrid/status/24028589385
Team “Dinosaur Comics – qwantz dot com” has added over 1,100 new members in the last 7 days! WOW! http://bit.ly/4GrCsk
1 year and 293 days of CPU time were logged yesterday from team Dinosaur Comics. My account contributed 1 day and 19 hours of that. GRID COMPUTING IS FUN. Also it keeps your room warm.
I was browsing Wikipedia’s multiverse page today and came across the following interesting quote by a cosmologist called Max Tegmark:
The algorithmic information content in a number is, roughly speaking, the length of the shortest computer program that will produce that number as output. For example, consider the set of all integers. Which is simpler, the whole set or just one number? Naively, you might think that a single number is simpler, but the entire set can be generated by quite a trivial computer program, whereas a single number can be hugely long. Therefore, the whole set is actually simpler.
This is an interesting observation. He uses this to defend that the multiverse hypothesis does not fall afoul of Occam’s razor. Occam’s razor literally says “do not needlessly multiply entities” but in common usage it just means “the simplest answer is probably right” or rather, a bit more strictly, “the answer which introduces the least number of new things (weighted against their improbability) is probably right”. Something which would fall afoul of Occam’s razor is an answer that raises more questions than it answers or just seems to introduce new things for no real justifiable reason.
The answer given by Occam’s razor may only be changed when new information becomes available. Thus Occam’s razor works very well if you have a lot of information, not so well if you don’t, but there is never a more reliable alternative.
But if Occam’s razor is to be used rigorously, an important question is raised: how exactly do we measure simplicity or complexity? Max Tegmark above opts to use Kolmogorov complexity, which defines a measure of complexity of a piece of data in terms of how much data is required for a computer to generate it. And he is right in that it takes a shorter program to generate the whole set of integers than to just generate one arbitrary one (because the average integer is pretty damn long). Elegantly in Python,
f = lambda n: [n] + f(n+1) print f(0)
At n=infinity it all collapses beautifully into a list of all the integers. And this is in 37 characters. One single 40 digit number can’t be expressed in 37 characters*, but somehow I’ve managed to express every 40 digit number, and a whole lot more, in just 37 characters.
Of course in practice, n is never infinity and it gives you a recursion depth exception, or you run out of memory and it crashes. I am unclear on whether Kolmogorov complexity requires the answer to be generated in a finite time, but I think the overall point stands regardless. We can see and trivially prove (assuming I didn’t make a mistake) the above program generates the set of all integers. It’s a valid mathematical description whether or not it could be evaluated by a machine with finite resources.
I am not a physicist (obviously) but I think there’s a case for the multiverse which basically equates to Occam’s razor.
There are apparent contradictions in choosing between either a single universe arising from an unintelligent mechanism or a single universe arising from an intelligent design mechanism. If it was unintelligent, how did it get so much right? By which, I refer to the so called fine tuned constants of the universe. Whilst I hesitate to accept that no other form of life could occur in a universe with different tuning, I am prepared to accept that there are more ways to make a universe uninhabitable than inhabitable. But if the design was intelligent, what is everything else doing here? It’s a wasteland. Neither explanation is satisfactory. The multiverse addresses the shortcomings of both.
Given that life exists, this universe must be inhabitable. The range of “inhabitable” spans from something which is the bare minimum to support life, to something which has no wasteful elements whatever and appears to be completely designed for life. To assign each a probability makes no sense for a single universe, because probability talks about average cases, but if there is only one case then this is by definition the average one. But if we consider a multiverse probability becomes a much more powerful tool: We expect the former (a just about adequate universe) to occur with much higher frequency than the latter (a perfect universe). Note that I shan’t assess the probability of any given universe being inhabitable, it’s not relevant to the argument, we just assume it’s non-zero.
The nearest we get to testing the multiverse hypothesis is to assess in its context how likely our own universe is. Our own is not so perfect as to be improbable by the multiverse hypothesis, in fact, outside of isolated pockets our universe is simply a wasteland which is very hostile to life, hostile through negligence rather than any concerted effort. Even the part we are in will only be friendly for a short amount of time relative to the timespan of the universe. So our own universe is safely towards the ‘bare minimum’ for life, and so it is within the expected probabilistic bounds for an inhabitable universe of a multiverse. This was only one test, but for now the hypothesis is unscathed.
It seems like introducing many universes should be a prime candidate for the accusation of “multiplying entities”, but I disagree. The example above shows that the set of all integers may be represented very concisely and more easily than one arbitrary integer. This is because the entity of importance is not the integer itself, but the method which generates it. And this parallels the universe: The universe isn’t the important entity, it’s the creation process we care about. We already accept that a creation process must exist, we merely specify a detail about that process. That detail could well be an extra entity, I agree, but in return this extra entity grants us a neat explanation. It beats the god/design argument because we replace actual intelligence (which requires a whole GREAT BIG explanation for itself) with an unintelligent pseudo-randomised process (which also still requires an explanation, but if it does not need to ‘think’, then it’s simpler). It beats the idea of a single universe because otherwise we would have to account for how an unintelligent process could pop into existence a universe which just happens to have all the settings tuned such that our physics and chemistry set the stage for the evolution of complex biological systems (i.e. us), and why it was invoked exactly once.
Even if the fine tuning assumptions turn out to be wrong, the idea that only one universe can exist is still baffling. One is a very arbitrary number. Having zero universes existing would be neat. Going from zero to one is very strange, but apparently it happened, so there we have it. But going from one to two is not so strange. And in fact not going to two implies some form of limit, which demands to be explained.
It seems to me that the multiverse is the best candidate to respect Occam’s razor.
In the interests of false trichotomies, I should also point out that another possibility is possible that there are actually infinite universes but they’re all identical. This still has the fine tuning issue, though.
_________
* Obviously some integers are factorable or otherwise reducible into shorter representations, but let’s assume this one isn’t. And if it is, make it 10 million digits and keep picking random numbers until you find one that definitely can’t be expressed in fewer than 37 characters.
more employment woes.
Continuing my employment escapades, I have been offered a telephone interview for a job I applied for. This sounds good but I am slightly disconcerted by it as I have no recollection of applying for the job, and it is not present under my ‘application history’ on the agency I supposedly applied through.
Anyway, the AMAZING thing about it, apart from the fact the person doing the interviews is ‘head of HR’, and he’s managed to weasel his way into employment by pretending he can divine programming prowess over a telephone, is that it’s offering 14,000 a year. YES! And no it’s not part time. I assume they are serious anyway.
There are a lot of jobs which are advertised as software ‘developer’ but are anything but. I this might be one of them. I did not respond to the email. I’ve also had an email from the same agency trying to interest me in another ‘development’ job. It has some really fuzzy requirements for candidates, one is “high computer literacy”, which is a baffling requirement for a job writing software. It also requires a “higher IT qualification”. It doesn’t even say degree. Or computer science. Or software engineering. Or anything like that. It’s just “IT”, which can mean just about anything. They are clearly not really looking for a developer. The agencies are useless because to them, I am an “IT” person. But to a company, an “IT” person is a catch-all phrase that could mean anything from someone who works on a helpdesk to a project manager. And generally speaking, if a company advertises for “IT”, they don’t want a programmer.
And I noptice these adverts with vague requirements, which obviously aren’t interested in or aren’t aware of strong technically able candidates, are always the ones that mention Microsoft technology. There is a DEFINITE correlation.
Some time after I graduated I had a wonderfully patronising conversation with a ‘friend’ online. I wish I still had it, but I’ve checked, and apparently my logs don’t go back that far. She decided for herself that I was universally unemployable unless I knew about ‘MS stacks’ and out of nowhere she asked what I knew about them. When I replied ‘pretty much nothing’, she said something along the lines of “if you start studying it then in 6 months I might consider hiring you”! I was speechless because 1) Their company uses Visual Basic, which is a programming language aimed at non-programmers who just want to make their computer to do some calculation 2) I’d never expressed any interest in working for her, or working with an ‘MS stack’ and 3) After I’ve learnt to write code for IIS and SQL server, what am I supposed to be doing for the other 5 and a half months before I become ‘hirable’?
She also complained that they recently recruited someone with an MSc in computer science and this person was useless. Yes, really? Well if you assume that all recent graduates are useless morons (like she did me), well, that’s the kind of graduate you’re going to attract when you’re hiring, isn’t it? In fact, being a Microsoft set-up in the first place is going to limit which graduates you are attractive to; programming is hard and it takes a lot of practice to get good at it. Unless you’re doing it as a hobby in your own time, you’re not going to get good at it by the time you leave university. University courses are just not programming-intensive. Your best bet for a competent graduate programmer is therefore a hobbyist, but the vast majority of hobbyist programmers use Linux because Linux was built by hobbyist programmers for themselves, it just works better. So I find it difficult to apply to C# jobs because although I have used it at university and it was totally indistinct from every other language ever, I have no experience using it at home so I do not have the same level of comfort with it as I do C or PHP or Python (or even JavaScript or C++ or Java), which are Linux friendly languages. In this respect, a naive interview process would filter me out in favour of much less proficient and practically experienced programmers, just because I couldn’t tell you off the top of my head how to access session data in C#.NET, or something.
The other thing is that Microsoft development tools are in one part aimed at people who don’t want to be concerned by any details of how what they’re doing actually brings about some working software. They don’t want to know about compiling all of their sources to object code then linking them all together into a single binary. They don’t want to know about sudo vim /etc/apache2/httpd.conf. They just want to press ‘run’ and have everything happen for them (and they want to avoid vim too).
Now there’s nothing inherently wrong with simplifying the process, but Microsoft pushes to get their environment included into university courses*. Combined with a large number of university students who are at university not because they’re genuinely interested but more because they’re expected to be, many students’ first and only exposure to a software development environment is writing 10 lines of C# or Visual Basic and then pressing the magic green button. If you choose your modules tactically it’s possible to get through university scraping a 2:1 having never written more than 50 lines of code in a single file and having never gained a lower level understanding of the compilation process than “it’s that green arrow button on the toolbar”. They’re not good programmers, they don’t understand what they’re doing, they are useless. You wouldn’t trust them to change their own underwear unsupervised. You certainly wouldn’t trust them with commit access to your main source repository. I see a lot of jobs catering to this kind of graduate, but quite few who are willing to consider that some graduates have a vague idea what they’re doing. It’s quite frustrating.
__________________
*not because they’re generous, of course, but because it encourages lock-in for developers. Visual Studio is cleverly designed to make it hard for new users to know what’s actually going on (I alwasy found it frustrating in this respect), so if you start people off on this, they’ll probably never move away from it to something that appears more difficult. This also feeds their OS monopoly as the programs their tools create will be Windows only. This is especially important in the server market where they lag behind. If someone writes a web back-end in C# then that has to be run on a Windows server. Windows of course has zero selling points of its own merit as a server platform because it’s harder to administrate than a Linux box, and Microsoft’s dedication to security has always been sketchy at best. They use the lock-in process to make their own substandard product marketable.
They do the same with Windows and schools. Windows and Microsoft Office absolutely must be in schools because they can’t afford to let children’s first exposure to computers be through Mac or Linux, because Mac is a far more friendly platform, and Linux has distributions specifically tailored to children, which are also presumably more friendly than Windows. Therefore in absence of a better product, Microsoft must endevour to remain being seen as the ‘familiar’ ‘safe’ choice. They spend a lot of money on this and they earn it back because 90% of the children in the class will be loyal Microsoft customers from then on. They’re a joke software company but they’re very good at marketing.
i had a title here but wordpress stripped it
it was <? echo ‘hello’; ?>
BAH
So I should probably post an update on my life about now.
I have mostly been busy with a WEBSITE. not just any website, but it is in fact my website! zomg. It is partly for self interest and partly a bit of a ‘portfolio’ for the purpose of getting a job. I think if people can go and inspect your abilities without having to contact you, then they’ll be more likely to offer you an interview than if they just read your cv and see some graduate claiming they can do this or that.
I have written a tonne of code in getting it working; I wrote my own content management system and I’ve got my own formatting language and parser. I also have created a source code browser and a few other things. I even set up a proper virtual machine with Apache, MySQL and PHP on it to test it all out on, AND with a subversion repository. PRETTY COOL. Rather useless but it might pop up in an interview, ‘do you know how to configure a server from scratch’ and I can now say ‘yes!’. Or at more honestly, ‘not really, but I’ve done it before and I dare say I could do it again’.
Anyway I’d love to tell you the address but I can’t put it here. Cunning individuals may be able to Google it, but not yet because Google isn’t currently indexing it. So you could browse WHOIS databases in the hope of seeing something interesting, or type in addresses at random. Or decide you’re not particularly interested. THE CHOICE IS YOURS.
hmm
So against my plans the interview went rather well and I am at risk of being offered the job. But I don’t want it. It’s not really me, it’s not what I see myself doing. And they are a tiny new company who appear to have taken on more work than they can handle and so they would expect me handle a very high workload for a well below average wage.
It’s kind of weird because I get the impression that to the 3 other programmers working there it’s a bit of a good deal for them because they don’t necessarily have a degree. One certainly doesn’t and I’d be surprised if the other two did. I don’t mean to be elitist, it’s a nice route into work for a competent hobbyist php programmer who can’t afford university and wants to get into web development, but I’m a competent hobbyist programmer with two degrees and some pretty broad skills.
I’m also slightly bothered about what having that as the first post-uni job on my CV will look like. It may typecast me into php development which I really really don’t want to get stuck in. I mean if you have a couple of years web development in C# then you can probably apply for a C++ job quite happily, but if you have a couple of years web development in PHP then they’re going to be sceptical unless you can produce recent examples of code you’ve written. Which I can do now, but if I’m working full-time then I’m unlikely to produce much if any new code. PHP has a lot of negative connotations because it’s a scripting language intended to have a low barrier of entry, and these aren’t entirely unfounded. It just seems like starting with a php job is a bit like saying “I want to be a rally driver … I can already drive better than most people so I’ll start by getting a job driving buses”.
I am wondering if I later plan to get into harder science disciplines whether having experience only in softer ones will reflect badly on me. It’s all very well saying “but I have a degree in maths”, but if all I’ve done in the last 2 years is PHP then they’d be valid to question whether I really still had those maths skills. So maybe I should really be heading towards engineering jobs… which should make it easier to move around later on, if I want.
Obviously it would have made more sense if I had considered this before applying for it.
Why is nobody employing me.
I have applied for a bunch of jobs now and none has even rejected me. And I hardly think they’re getting many better candidates, WITHOUT MEANING TO TOOT MY OWN HORN, but it’s true. I applied for one last week which wanted skills in HTML, CSS, JavaScript, PHP, Python, SQL and Linux and I heard absolutely nothing back. I mean, hello, am I the same person who can write Python and Javsacript and PHP while using Linux in my sleep or DID I JUST IMAGINE THAT I AM THAT PERSON?
So the end to this story is I am bored of being ignored for jobs I am perfectly qualified for so today I applied for a job that I’m not really qualified for.
In other news I started leraning Haskell… which seems pretty cool. It’s a pure functional language which is pretty cool. Functional languages tend to be closer to the maths than C-like languages and they’re much more expressive. Much more cryptic too, but a lot more elegant once you get your head around it.
Here’s the naive factorial function in haskell (using the ghci interpretor).
Prelude> let factorial n = if n == 0 then 1 else n*factorial (n-1) Prelude> factorial 100 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
you can also do something like this:
Prelude> let fac 0 = 1 Prelude> let fac n = n*factorial(n-1) Prelude> fac 100 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
which is PRETTY NICE, you can define special/base cases like a piecewise function instead of having if/elses inside the function definition.
And then we can use its set builder notation to achieve crazy things:
quicksort [] = [] quicksort (x:xs) = quicksort [a | a <- xs, a<x] ++ [x] ++ quicksort [b | b <- xs, b>= x]
YEP that’s a quicksort in two lines and if you understand set builder notation it makes PERFECT SENSE (although it also helps if you know that in a list A = x:xs, x is the first element and xs is the rest of the list. The rest you should be able to figure out yourself.)
MIND == BLOWN.
I’m seeing a lot of parallels to some of the features Python I really like, but Haskell just blows it away. If you wanted to do that in roughly equivalent Python:
quicksort = lambda l: [] if l == [] else (quicksort(filter(lambda x: x<l[0], l[1:])) + [l[0]] + quicksort(filter(lambda x: x>=l[0], l[1:]) ))
something like that anyway. It’s not pretty. If you actually used this in a program somebody would shoot you.
the god delusion chapter 3
Continuing my exciting following of the God Delusion, chapter 3 concerns common arguments for god’s existence and rebuttals. I quite liked this chapter, although anyone with an interest in the subject will already have heard the main arguments listed here.
The most interesting of these arguments are the ontological proofs, which are clever mathematical/logical (I’ll treat these two words as interchangeable in this entry) arguments which proceed as follows:
1) Imagine a super perfect who cannot be any more super-perfect
2) Suppose said god does not exist
3) Said god would be even more perfect if he did exist
(3) contradicts (1), meaning (2) lead to a contradiction so it must be false. Therefore god must exist.
Dawkins ridicules this a bit, which I find pretty weak, because he can’t actually find a flaw in it. If you study maths at all you’ll see proofs like this all the time, it’s a perfectly valid proof and technique, although Dawkins seems to dismiss it as immature. Even so, this proof isn’t awfully convincing. I think the problem with this particular one, which makes it distinct from ones we are happy to see as valid, is that we are granting a mathematical proof the power to tell us things about our world. This is not what they are supposed to do. They are supposed to tell us things about a mathematical system. Many people think that maths is somehow woven into the fabric of the universe, but I think this is just a fairly naive Platonist interpretation.
The result (god does exist) is actually contradicted again by the fact that being ‘perfect’ in every way is impossible. One cannot be perfectly evil and perfectly good, and if you’re not both, you’re not perfect. Nobody claims god is perfectly evil (of course), but they do claim he is omniscient and omnipotent, and these two properties seem necessary for perfection. These two properties also imply a contradiction: if god is omniscient he already knows the course of the universe which means he cannot later change it, which means he is not omnipotent (as mentioned in the book). As does perfectly merciful and perfectly just: just implies punishing everyone as they deserve, merciful implies being a bit soft and not (as mentioned on Wikipedia). The fact that our system apparently allows two contradictory theorems shows that there’s something a bit wrong. [Or actually that's being a bit generous. Since overall perfection is contradictory and the above argument depends upon it, the argument is vacuous anyway. Although you can loosen your definition of perfection]
I think the actual fallacy in these arguments is in the word ‘imagine’. If you imagine a god who does not exist then you’re not imagining the best god possible. I can imagine an elephant that flies, but that doesn’t bring it into existence, not even if I set up an argument analogous to the above where its nonexistence is contradictory … imagine a super perfect elephant, suppose elephant lacks the ability to fly, uh oh then it’s not perfect so it must be able to fly, now suppose the elephant doesn’t exist, uh oh then it’s not perfect so it must exist. So where’s my flying elephant? Just for geographical clarity I can define that ‘being in my room’ is necessary for perfection. And yet still no flying elephant. An exercise to the over-enthusiastic reader is to investigate what happens when you want to introduce a second god who’s better than the first.
And this basically highlights why this stuff works in maths and not in real life: because maths is pretty much imaginary. We can imagine something, whether it’s the number 14, the square root of minus one, a circle or a universal Turing machine, as long as we can define it in words or symbols, it ‘exists’ as much as it needs to for it to be a valid, usable mathematical entity. That doesn’t mean it really physically exists or that it needs to; you can argue that ’14′ exists by giving me 14 apples if you want1 but you will find it more difficult to show that (-1)1/2 exists because it’s a nonsensical operation on anything we can show physically (yet we can handle it just fine in maths), and you would definitely have trouble physically showing a number greater than the number of atoms in the universe2 really exists, but it would still be perfectly valid to use it. My flying elephant is imaginary so it exists and is usable as far as the proof is concerned, but the proof is not able to reach out and effect a flying elephant into the real world.
Therefore a proof like this can assert the existence of something and be entirely correct logically, but have no link to real life, hence you have to be a bit sceptical when someone uses supposedly infallible reasoning to ‘prove’ some less than obvious statement about the universe without worrying about empirical evidence; it’s a clear sign they have a naive view of maths/logic.
I believe this also addresses Gödel’s stronger and more rigorous ontological proof, but it’s a bit cryptic and I haven’t sat down and gone through it.
_________
1. I’d say this was rather missing the point, but we define natural numbers in terms of sets so maybe not. Although technically numbers are defined in terms the empty set so maybe it is. Actually this was a really big problem: what is a number? we have all these things and we never said what they actually were. How can our maths be truly rigorous if we’ve not addressed what these things are? The solution is bizarre, clever, and thoroughly brilliant, we define them in terms of NOTHINGS:
0 = {} [the brackets denote a set, and its contents are separated by commas. In this case, there is nothing between the brackets because the set is empty and we call it the empty set]
n+1 = n U {n} [the joining together of n and the set containing n,
e.g. {0, 1} U {2, 3} = {0, 1, 2, 3} ]
so,
0 = {}
1 = {} U {{}} = {{}} = {0}
2 = {{}} U {{{}}} = {{}, {{}}} = {0, 1}
3 = {{}, {{}}} U { {{}, {{}}} } = {{}, {{}}, {{}, {{}}}} = {0, 1, 2}
and so on. Obviously it’s easier if you start substituting in the numbers rather than keeping the indecipherable lists of brackets, but I thought it was interesting to highlight that each natural number is defined by clever arrangements of nothing. This should convince you that maths is something quite separate from real life.
2. Assume this is finite. Even if it’s not, it is sufficient that the statement would be correct if the number was finite.
Fallout!
Bethesda announce new Fallout game for 2010! I read it on Slashdot.
Maybe i am just too old for computer games (I am almost 22, when I see fresh faced 18 year old first years who barely look 16 wandering around uni it makes me think JEEEEBUS, I was a first year once… how did this happen?)…
…but, here is my reaction: “…”. I am sorry Bethesda, it was fun while it lasted but Fallout 3 was hardly brilliant. The combat system was… there is only one word for this, and that word is “shit”. It would have been better scrapping the FPS element and making it third person and having a proper turn based combat system with proper action points instead of your pointless Max Payne VATS system instead. Because aiming perfectly at someone’s chest with a big hunting rifle only to see the wall flash 2 feet away from them because your small arms skill is only at 62% gets old REALLY fast.
In fact you know what, just make it 2d. I am not being facetious. It’s completely unnecessary for it to be 3d, or pretty. And by making it 3d they need a lot more time to create locations and people and content in general. And this showed in F3 where supposedly you could walk across the entire ‘wasteland’ in about 20 minutes, whereas in F1/2 it’d switch to a map view and plot your progress while the ingame clock progressed rapidly.
ALSO it was not very Falloutish. I mean, the Fallout universe has spelling mistakes, dark/obscure humour and MASSIVE power armour. F3 had few if any spelling mistakes, was pretty devoid of humour and power armour made you look about 5’6. For example in F3 you’d never come across a man in the middle of the wasteland infront of a long rickety rope bridge over a canyon wearing a wizard’s robe, and see your character announce to the world in general “I think I should save in a different slot right now”, then proceed to be quizzed on the intricacies of the Fallout 1 engine rules. Things like that make the difference between memorable and brilliant, but unfortunately F3 is more just bland.
The enemies were crap too. You have encountered a giant radscorpion! Which is near impossible to kill! That sucks. In F2 I’d regularly encounter packs of 8 or 9 radscorpions and I’d think “woot XP” and get out my sledgehammer. In F3 I’d think “oh shit” and run backwards shooting it in the face with the biggest gun I have any ammo for and then I probably walk right into another one. That was crap.
And the locations were pretty lazy. F2 had great places like Vault City and New Reno which were very memorable. So was the one with all the scientologists (was it San Diego? *looks at map* no it was San Francisco) and the New California Republic. Broken Hills, the mining town with Marcus the Sheriff, everyone remembers him… especially when he joined your party, and you’re on a cattle drive, and he opens up with his minigun and in one burst not only does he decapitate all your attackers, but he also slaughters all the cattle you’re supposed to be protecting. It’s all memorable.
OTOH F3′s only interesting city was, er, what was it called, the one with the bomb? Barter Town? no that was Mad Max? I don’t fucking know. That’s how memorable it is. Anyway, that’s the first city you see and you think “great!” but then after it turns out it’s the most complicated ecosystem in the whole game you feel a bit cheated. Rivet City was big and boring. The Republic of Dave was pointless. Umm. Paradise Falls was empty. And so on.
Bethesda have also trademarked the name Fallout for TV and Film. Oh no. Please don’t. There are already three Fallout films, they’re called Mad Max 1-3. Unless the third one really was as bad as I remember it in which case we’ll cut it to the first two.
please note it was actually a fairly good game it’s just that since it was a Fallout sequel it should have been no less than perfect.
I liked the Dogmeat and Harold reference, and the Mr Handys’ transition to voiced 3d were superb.
I wrote more about that than I intended to.
