Posts tagged “programming

I went running today and now I can’t walk.

WHAT HAPPENED, LEGS? WE WERE WORKING SO WELL TOGETHER.

In other news, VALVE MAKE TF2 WORK AGAIN. JUST ROLL BACK THE PATCHES FOR GOD’S SAKES. I am willing to relinquish my Manniversary hat, even if it did look rather fetching on my Demoknight. On SPUF (steam user forums) half the people there are defending Valve. Why? They pushed an update and it’s broken TF2 for everyone. Don’t they test these things before they release them? Don’t tell me Valve is “doing their best”; they evidently are not. If they were taking the care you’d expect of a big name software house, they’d run updates on their test system for 24 hours before pushing it out. The lack of any kind of official announcement that they’ve fucked everything up is only adding to the image of unprofessionalism they are currently cultivating.

IN OTHER OTHER NEWS, for lack of anything better to do, I have been writing a blogging system using CodeIgniter. Mostly the reason for this is that I have a news announcement feed which currently runs off WordPress but WordPress is extremely heavy and difficult to customise. The WordPress you see here powering my blog is not very similar to the WordPress you get when you download it and deploy it yourself. The WordPress I have here is very nice, it provides me with much useful functionality, it is well configured and I have no complaints. The WordPress you download is less impressive and 3rd party plugins are a horrible minefield, and even the good ones, well, I can’t be bothered to review them for security and stuff. WordPress from a developer’s point of view is utterly horrible.

So for fun I’ve been putting together my own simple blogging app over the last two days. It is horribly incomplete but it does the following things in some form or another:

User login and registration,
Post authoring/editing,
Tags (and tag browsing),
Comment authoring and administration and classification into approved/pending/spam/trash,
RSS post feed (works for tags as well),
a loose plugin system using a signals/slots idea
A plugin which throttles comments by IP by setting them to ‘pending’

hurrah?

The biggest problem I’ve had is that CodeIgniter has not scaled to a non-trivial SQL schema as well as I’d hoped. I mean, I’m still writing a lot of SQL and I hate writing SQL. I also hate dealing with the results of SQL and trying to put it into a form useful for my program. Next time I will look at using Cake.


building a tree from a flat definition (with a stack)

Something I find myself doing often is trying to build a hierarchical data structure (tree) from a flat definition of some kind. An example is that I have a simple formatting language whose parser outputs to HTML, and it can handle nested lists. The list is just a flat piece of text, but to convert it into valid HTML, it’s easiest to convert it into a tree and recurse down it. This is one of those situations where the right choice of data structure makes the resulting code a lot cleaner and more direct. I’ve had to do it in a bunch of other situations too, so it seems to be a semi-common task and it’s not immediately obvious how best to do it (unless I missed that class).

The way which seems to work best is to use a stack. Iterate through each of your elements in your definition and create a node for each one. The node needs to keep a list of children (and whatever other information is relevant). It doesn’t need to keep track of its parent. As you’re going through, put each node on the stack. When the node is complete*, pop it and append it to the children of the node which is now on the top of the stack. The last node you pop before the stack is empty should be the root node of your tree. Simples. You’ve now got a nice tree you can recurse down or whatever.

Notes:

  • It can be easier to put a dummy root node on the stack first. This is so that you don’t have to worry about popping the root node and it going out of scope.
  • At the end, you may find you still have a bunch of nodes still on the stack. Obviously you have to pop these and append them as children before the process is complete.
  • At any one iteration, your data definition might be such that you need to pop more than one node.

*Ascertaining when a node is complete depends how your data is laid out, but let’s say you’re using Pythonesque left-leading indentation, a node is complete as soon as the indentation level of the element you’re looking at drops below the indentation level that corresponds to your node.


GeSHi 1.2

I checked out the SVN branch of GeSHi 1.2 today. I looked at GeSHi very closely about a year ago and was less than impressed. A lot of people seem to rate it, and to be honest, this mystifies me. I ended up writing my own syntax highlighter, which is better than GeSHi, I think.

Anyway, I looked at 1.2 and it’s better than it was, but I’m still not overwhelmed. As far as I can tell, it has been re-written to emulate a state machine, which is definitely an improvement, but the code seems convoluted and it still doesn’t seem that powerful. I gave it some JavaScript and it was easy to make it confused on regex literals. JavaScript’s disambiguation for the ‘/’ symbol is very simple and well defined, but if your lexer is not powerful/extensible/explicit enough, it will not be able to check the condition and you’ll have to resort to hackish regular expression look-behinds (which won’t always work).

I looked at the code and to be honest I don’t really understand it, but it looks like it’s still using one central lexer to handle everything. When you aim to recognise more than a few languages, it seems ridiculous not to try to abstract and equate them, but that’s exactly what I didn’t do and I think it was the best choice I could have made. That’s the thing with abstraction – it can make life easier or it can make life harder. At worst it can be a total pain and and be a problem in its own right, but more often, it just limits you. The key is in choosing what to abstract, and the best answer is not always the most obvious. It pays to be clever in your design so you can write straight-forward code; the minute your code has to be ‘clever’ is the minute you need a new design.


making a custom fully working 404 page with codeigniter

[update: I changed this to use exceptions, it's better this way]

CodeIgniter has a really annoying bug with its 404_override setting which makes it pretty much useless as a complete solution. The bug is that it won’t load the correct libraries and helpers of your 404 controller if the controller which threw the 404 also loaded those modules for itself. Fear not, a workaround is at hand. The advantage of properly handling a 404 in the context of your CodeIgniter application is that you get access to your standard stuff like your site layout (views) so you can load all your menus and stuff and load the error message where the content would normally go – this way the visitor feels like they’re on the site and they can get onto a working page easily, if they so desire. You can also hook into your database and perhaps do some logging.

Firstly, if you’re not using MY_Controller then, err, well you really should be. It makes life so much easier. MY_Controller is an easy to define superclass for all of your controllers. Implementing it lets you centralise all your site-wide constants and routines, and it’s very useful here.

Define it in application/core/MY_Controller.php and make all your ‘real’ controllers extend it. Also define individual exception classes that you want to handle. The _remap method wraps the controller’s method invocation, so we override that and use it to catch exceptions. This means that at any point in your controllers, you can just throw an Exception404 and you don’t have to worry about anything else; the exception bubbles up to _remap which unloads any existing output and then loads the 404 view(s).

You could extend this to also define 403 exceptions, and any other error codes your application may deal with, and you could use the exception message (i.e. throw new Exception404(‘some message’);) to set some descriptive text for your 404 view. [update: actually I tried this and it's a bit cumbersome, you're probably best off having just a HTTPException class and set the status code as a constructor argument)]

This is sort of how mine looks:

// defined for neater syntax only
class Exception404 extends Exception {
}

public class MY_Controller extends CI_Controller {
  public function _remap($method, $params = array()) {
    try {
      if (!method_exists($this, $method))
        throw new Exception404();
      return call_user_func_array(array($this, $method), $params);
    } catch(Exception404 $e) {
      $this->show_404();
    }
  }
  protected function _load_header($data=array()) {
    $this->load->view('headerview.php', $data);
  }
  protected function _load_footer() {
    $this->load->view('footerview.php');
  }
  protected function show_404() {
    // clear any views that have already been loaded
    $this->output->set_output('');
    $this->output->set_status_header('404');
    $this->_load_header();
    $this->load->view('404.php');
    $this->_load_footer();
  }
}

CodeIgniter URLs work like this:

index.php/CONTROLLER/METHOD/ARG1/ARG2/ARG3/…/ARGN

So we can see here that when we request a method of a controller, the remap function catches it. If the URL is okay and the method exists, it is executed. If it doesn’t, the 404 is executed.

To make it so it also catches non-existent controllers, change application/config/routes.php:

$route['404_override'] = 'CONTROLLER/show_404';

CONTROLLER can be any controller which inherits MY_Controller. I use my default, you can use whatever most appeals or, if you want, define a new controller.

You may (probably) have your application and server set up so it’s possible to request URLs which aren’t handled by CodeIgniter’s index.php. In this case, you probably also want to redirect when Apache catches a 404 to your new custom 404. Put this in your htaccess:

ErrorDocument 404 /index.php/CONTROLLER/show_404

Where again, CONTROLLER is some controller that implements show404.

you should now have site-wide redirection of file-not-found URLs to your exciting and sexy new 404 page. I thoroughly recommend using the opportunity to display to your baffled visitor a picture of a lolcat. I recommend this one:

or (better),


php performance woes

I found a fairly strange case of php performance scaling today. I’m sorry, but I’m not going to try to separate this out of a several thousand line codebase to reproduce the exact conditions it’s happening, so we’ll just look at the snippet that directly affects the performance.

The function at fault is building a tree via a stack, the algorithm probably has a name but I don’t know it. It’s not important. We’re inserting some text into the tree as a child of current node, and we’re trying to avoid adjacent separate strings by concatenating them. As it happens, the concatenation here was later rendered redundant.

The function looks like this:

<?php   
  $children = & $this->node_stack[count($this->node_stack)-1]['children'];
  if (!empty($children) && is_string($children[count($children)-1])
    $children[count($children)-1] .= $str;
  else
    $children[] = $str;

you notice what about this code? It’s not entirely optimal, correct! But unless you’re a micro-optimisation nazi, you’ve probably misjudged the real impact it has. With this code, a 30KB input runs in about 0.55 seconds and a 60KB input (the first input doubled up) runs in 3.5 seconds. A 200KB input triggers a 60 second timeout. Err, clearly that’s not very good..

If you remove lines 2, 3 and 4, (these are redundant, remember) everything runs perfectly linearly: 30KB runs in 0.3 seconds, 60KB runs in 0.6 seconds and 200KB runs in about 2.0 seconds.

The problematic part turns out to be the call to count($children). The two following functions have 1) linear and 2) exponential(ish — didn’t plot a graph) runtime

<?php
// linear 
function fast($str) {   
  $children = & $this->node_stack[count($this->node_stack)-1]['children'];
  $children[] = $str;
}
// exponential (probably)
function slow($str) {
  $children = & $this->node_stack[count($this->node_stack)-1]['children'];
  $children[] = $str;
  count($children);
}

This *really* shouldn’t happen. Everyone knows that count is O(1). And we’re also calling count on the node stack with no adverse effect.

Here’s the bizarre thing, as generated by the profiler on the same input:

fast: called 7924 times, 2.59% of program runtime
count (as callee of fast): called 7924 times, 5.23% of fast()

slow: called 7924 times, 42.3 % of program runtime
count (as callee of slow): called 15848 times, 0.55% of slow()

yeah! Read that again: count takes up a lot less relative time in slow() even though it’s called twice as many times, even though its presence magically causes the calling function to slow down massively. According to the profiler, the absolute time taken for count() is about 3x as much slow() vs fast(), we expect it to be 2x, but 3x is probably within error bounds.

It’s safe to say I haven’t got a clue what’s going on here.
MYSTERY.

this is php 5.3 and xampp on Ubuntu x64. I’ll find out exactly which version of php if anyone cares.

err edit, this seems to be it: http://bugs.php.net/bug.php?id=34540… OMG PHP is retarded sometimes. “Thank you for taking the time to write to us, but this is not
a bug.”
umm yes it is.


regular expressions: the cause of and solution to, all of life’s problems.

uhum, a few years ago I wrote a regular expression to try to approximate the number of syllables in a word.
Today I dug it out and decided it was not good enough! Instead of approximating the number of syllables, let’s try to make it match the full syllables!

The result follows. It’s in Python re module syntax, which is pretty much PCRE but you’ll notice python lets you do non-fixed length assertions (huzzah), which PCRE does not (boo). Also I apologise for some bad constructs but I wrote it several years ago. All {0,1}s should be ‘?’ and all x|y should be [xy], but when your regexes get this big you get wary of changing them.

You have to split your input by re.split(‘[^a-z]‘) because the regex operates at the word level. Then you do an re.findall() with the re.X/re.VERBOSE flag set.

(?:
  [bcdfghjklmnpqrstvwxz]*
  (?:
        [aeiou]+[^aeiou]e(?=[aeious])
        |
        #Easy vowels first, each may be repeated up to 3 times.
        (?:
            #i may not be part of ing, eg 'trying' is two syllables but here would be counted as one. 
            # ing is addressed below.
            # likewise ia  iod ua are generally pronounced as two syllables (deviation, period, usual). 
            # These are excluded here to  remove from the repetition qualifier. 
            a|i(?!:ng|a|o)|o|u(?!a|se)|y   
            |oe

            #e is not so easy, and to avoid the repeatition qualifier it is also addressed underneath.
            # match e if not on the end of a word like 'note', or plural forms, like 'notes'. 
            | (?:e(?!(?:s){0,1}$) )  
            
        ){1,3}
        
        # now we address the possibility of 'e' being on the end of a word and does qualify for a syllable, 
        # for example 'maybe', or 'couple'
        #Do match e at the end if preceding letters were not such that a silent e is likely: 
        # >1 consonants EXCEPTION: r two letters before the e, as in 'barge', 'nce' as in variance 
        #     errrr shouldn't the last subpattern be a look AHEAD?
        | (?<=[^aeiour][^aeiou])(?:e[sd])(?:[\s]|$)(?<!nce)
     
        #and we rejected e when on the end of word or a plural, but what about voices or aces.
        |ces$

        #short words ending in e or a, like 'me', 'the', 'sea', etc. 
        | ^[a-z]{1,2}[ea]$

        #excluded exceptions from earlier. Note non-consumed following letters to avoid overlapping.
        
        | ing    | i(?=a)  
        | i(?=o) | u(?=a)(?!are)
        
  )
  (?:
    (?:
      # bah this needs refactoring
      bb|dd|ff|gg|ll|mm|nn|ss|tt
      |ch|sh|ght?|lt|th|rn|st|ft|nd|rt|rd|gn$|nt|ct|(?<=[aeiou])xp|lf
      |ng|ck|rst|lp|ld|rp
      |
      [bcdfghjklmnpqrstvwxz]?
    )?
    (?:s$)?
  )?
)
|.+

I ran it on an excerpt of a Pirates Of the Caribbean article from Wikipedia and it split the words as follows:

unintelligible  ['un', 'int', 'ell', 'ig', 'ib', 'le']
relinquishes    ['rel', 'in', 'quish', 'es']
intelligence    ['int', 'ell', 'ig', 'en', 'ce']
negotiations    ['neg', 'ot', 'i', 'at', 'i', 'ons']
interceptor     ['int', 'er', 'cep', 'tor']
encouraging     ['en', 'cour', 'ag', 'ing']
intelligent     ['int', 'ell', 'ig', 'ent']
calculating     ['cal', 'cul', 'at', 'ing']
vocabulary      ['voc', 'ab', 'ul', 'ar', 'y']
lieutenant      ['lieut', 'en', 'ant']
previously      ['prev', 'i', 'ous', 'ly']
deciphered      ['dec', 'ip', 'her', 'ed']
acquiesces      ['ac', 'quies', 'ces']
norrington      ['nor', 'ring', 'ton']
desiderata      ['des', 'id', 'er', 'at', 'a']
admiration      ['ad', 'mir', 'at', 'i', 'on']
strategies      ['strat', 'eg', 'i', 'es']
intentions      ['int', 'ent', 'i', 'ons']
proclaimed      ['proc', 'laim', 'ed']
grappling       ['grap', 'pling']
elizabeth       ['el', 'iz', 'ab', 'eth']
evidenced       ['ev', 'id', 'en', 'ced']
advantage       ['ad', 'vant', 'ag', 'e']
returning       ['ret', 'urn', 'ing']
seemingly       ['seem', 'ing', 'ly']
commodore       ['comm', 'od', 'or', 'e']
actuality       ['act', 'u', 'al', 'it', 'y']
consensus       ['con', 'sen', 'sus']
announces       ['ann', 'oun', 'ces']
swordsman       ['sword', 'sman']
suggested       ['sugg', 'est', 'ed']
resulting       ['res', 'ult', 'ing']
determine       ['det', 'er', 'min', 'e']
indicated       ['ind', 'ic', 'at', 'ed']
persuades       ['per', 'suades']
murderous       ['murd', 'er', 'ous']
negotiate       ['neg', 'ot', 'i', 'at', 'e']
extremely       ['ex', 'trem', 'el', 'y']
mutinied        ['mut', 'in', 'ied']
stabbing        ['stabb', 'ing']
explains        ['exp', 'lains']
attitude        ['att', 'it', 'ud', 'e']
immortal        ['imm', 'ort', 'al']
contrast        ['cont', 'rast']
although        ['alt', 'hough']

Not bad?


let’s do programming #2

Today we’re going to be implementing a sine approximation algorithm in LOLCODE, using a Taylor series expansion.

HAI   

BTW this function calculates the exponential: num1^num2
HOW DUZ I POWER YR NUM1 AN YR NUM2
  I HAS A COUNTER ITZ 0
  I HAS A ACCUM ITZ 1
  IM IN YR LOOP UPPIN YR COUNTER WILE COUNTER SMALLR THAN NUM2?
    ACCUM R PRODUKT OF NUM1 AN ACCUM
  IM OUTTA YR LOOP
  FOUND YR ACCUM
IF U SAY SO

HOW DUZ I FACTORIAL YOUR_NUMBER
  BOTH SAEM 0 AN BIGGR OF YOUR_NUMBER AN 0, O RLY?
    YA RLY
      FOUND YR 1
    NO WAI
      FOUND YR PRODUKT OF YOUR_NUMBER AN FACTORIAL DIFF OF YOUR_NUMBER AN 1
  OIC
IF U SAY SO


HOW DUZ I SIN X

  I HAS A OUTPUT ITZ X
  I HAS A COUNTER ITZ 3
  I HAS A SIGN ITZ -1
  I HAS A LIMIT ITZ 11

  IM IN YR LOOP
    COUNTER BIGGR THAN LIMIT?, O RLY?
      YA RLY, GTFO
    OIC
    
    I HAS A NUMERATOR ITZ POWER X COUNTER
    I HAS A DENOMINATOR ITZ FACTORIAL COUNTER
    I HAS A TERM ITZ QUOSHUNT OF NUMERATOR AN DENOMINATOR
    
    TERM R PRODUKT OF TERM AN SIGN
    
    OUTPUT R SUM OF OUTPUT AN TERM
    
    COUNTER R SUM OF COUNTER AN 2
    SIGN R PRODUKT OF SIGN AN -1

  IM OUTTA YR LOOP
  FOUND YR OUTPUT
IF U SAY SO


I HAS A PI ITZ 3.14159265
VISIBLE SIN PRODUKT OF PI AN 0
VISIBLE SIN PRODUKT OF PI AN 0.25
VISIBLE SIN PRODUKT OF PI AN 0.5
VISIBLE SIN PRODUKT OF PI AN 0.75
VISIBLE SIN PI
VISIBLE SIN PRODUKT OF PI AN 1.5

KTHXBYE

I’m sure you all have LOLCODE interpretors but in case you haven’t, here’s the output from mine.

sin(0)       = 0
sin(pi/4)    = 0.7071067805450276
sin(pi/2)    = 0.9999999437410517
sin(3*pi/4)  = 0.7070959919948316
sin(pi)      = -0.0004451566418505852    [should be 0]
sin(3*pi/2)  = -1.0818902096280751  [should be -1]

Anyone who’s ever done any maths will know that if you see the number 0.707…, it means you’ve done it right. Sin(pi/2) is supposed to be 1, that’s just a floating point thing. The last two are beginning to diverge from usefulness though. The Taylor series method yields the best results around x=0. By the time you get past pi or -pi it starts to get a bit dubious.

In a bizarre way, LOLCODE reads like English more than most scripting languages. This has the unexpected downside that when you try to read it, you can’t easily distinguish between language syntax and logic. Highlighting would help, obviously, but I think the fact it’s all done in words and not symbols means you have trouble looking at it and pulling out what you want to know. You end up reading through it and parsing it as English. Maybe my brain just isn’t used to tokenising text like this and it would adapt to it.

I really love using ‘GTFO’ as a break statement. Counter’s bigger than 10, OH REALLY? YEAH REALLY, GET THE FUCK OUT. Hilarious o.O


let’s do programming

Brainfuck interpretor in Python. Kind of fun.
I hereby declare this code to be free under the WTFPL.

#!/usr/bin/env python

import sys

def bf(code):
  input_ptr = 0
  tape_ptr = 0
  tape = [0] * 30000

  while input_ptr < len(code):
    char = code[input_ptr]
    if char == '>' : tape_ptr += 1
    elif char == '<' : tape_ptr -= 1
    elif char == '+' : tape[tape_ptr] += 1
    elif char == '-' : tape[tape_ptr] -= 1
    elif char == '.' : sys.stdout.write(chr(tape[tape_ptr]))
    elif char == ',' :
      try: tape[tape_ptr] = ord(raw_input()[0])
      except: continue
    elif char == '[' and not tape[tape_ptr] or char == ']' and tape[tape_ptr]:
      stack = 0
      while 0 <= input_ptr < len(code):
        input_ptr += 1 if char == '[' else -1
        if code[input_ptr] == char: stack+=1
        elif code[input_ptr] == '[' or code[input_ptr] == ']':
          if stack: stack -= 1
          else: break
      if stack: raise Exception("Mismatched brackets")

    input_ptr += 1

if __name__ == '__main__':
  code = ""
  if len(sys.argv) < 2 or sys.argv[1] == '-':
    code = sys.stdin.read()
  else: code = sys.argv[1]
  bf(code)

Usage:

$ ./bf '++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.'
Hello World!

yes, it’s a brilliantly stupid programming language. You get a tape memory, which you can traverse left and right with < and > respectively. You can increment and decrement the current tape element with ‘+’ and ‘-’. IO is ‘,’ and ‘.’, and [...] denotes a while loop. The loop’s condition is that the current tape element is non-zero. And that’s all there is to it.


IPs as evidence

I am reading a lot of things to do with file sharing and flawed evidence lately. One of the common misconceptions is that IPs can be arbitrarily spoofed. This isn’t really true. You can spoof your IP but you won’t receive any packets, so it’s kind of a pointless exercise unless you’re doing something cunning. It’s not practical for web usage. It’s very hard to do it most of the time because the data transfer protocol you use will most likely be built on TCP which requires a forwards-backwards handshaking procedure to establish a connection first, and it’s hard to bluff your way through the handshake if you don’t know what the other person’s sending back. Or as nmap says: TCP Sequence Prediction: Difficulty=204 (Good luck!).

But I think this is all missing the point a bit. The IP in the first place is such flimsy and meaningless evidence. There are too many humans in the chain who stand to gain from just making them up and it’s trivial to do so. Between a law firm and a ‘monitoring’ firm, you’ve got two separate entities whose entire business depends on being able to come up with a set number of IP addresses per month. It isn’t a difficult task to add on an extra 10% to ensure you get a nice bonus. It isn’t even a difficult task to fictitiously generate 95% of your data.

If I was doing this, my first thoughts would be like so: I’d probably proceed by recompiling a torrent client to write a list of IPs for each torrent to a text file (filtering by UK IPs belonging to ISPs who don’t protect their customers), then if I was a greedy cretin I’d most likely bump up the number like so:

#!/usr/bin/env python
import commands
import random
import re

ip32 = lambda n: sum([int(x)<<8*(3-i) for i, x in enumerate( (n.split(".")) ) ])

def random_ip_from_subnet(network, subnet):
  ip = 0  
  for i in range(32)[::-1]:
    snbit = (subnet >> i ) & 1
    ip = ip << 1
    ip |= (network >> i) & 1 if snbit else random.randint(0, 1)

  return ip

users = [] # read user list in here as list of (ip, date) tuples
newusers = []

for ip, date in users:
  newusers += [(ip, date)]
  subnet = commands.getoutput("whois {0} | grep route | awk {{'print $2 '}}".format(ip)).trim()
  if re.match("^(\d{,3}.){3}\d{,3}/\d+$", subnet):
    [network, mask] =  subnet.split("/")
    networkip = ip32(network)
    maskip = 2**(32-int(mask))-1
    
    for i in xrange(3):
      fakeip = random_ip_from_subnet(networkip, maskip)      
      fakedate = date + (random.random() - 0.5) * (60*60*24*28)      
      newusers += [(fakeip, fakedate)]

# write newusers to file here.

(note: untested code)

And there we go, I’ve just increased my income by 300% in about 30 lines of code. Each IP I generate is on the same ISP as one of the ones I’ve really captured, so I know that I’m not going to have to submit multiple court orders. Some of these will likely not be allocated at the time of the alleged infringement, so the ISP won’t be able to identify them to a user, but IPv4 is oversubscribed so we’re not just taking wild stabs in the dark. A lot of these will be allocated. I’ve looked through ACS:Law’s spreadsheets and on one of the BSkyB ones, 1387 out of 5699 records are ‘unknown’. On another 1325 out of 5358 are unknown. What’s going on here? It doesn’t say why they’re unknown, but it’s a massive failure rate.

The basic idea of this code is that it takes a known IP, it then looks up that IP’s WHOIS record which tells us what range (subnet) it’s on. Then we start generating IPs at random from that range. We also randomise the date within a 14 day range (although if I was making a serious effort, I’d then truncate the hours/minutes/seconds and instead choose time of day randomly from a non-linear distribution to favour evenings during the week (negatively skewed normal distribution, mean 8pm) and daytime at weekends — this would take an extra 10 lines or so but I’d have to read up on both the random and date modules so I didn’t bother). Also if I was making a serious effort I’d drop or regenerate IPs whose last byte is 0 or 0xFF.

When someone says “I’ve got an IP address” the correct response is not “alright here’s a court order to get the ISP to tell you who it belongs to so you can send them a letter demanding £500″; the correct response is “so?”.


anagram generator in python

it helps me with the crosswords :p

def ramnaga(letters, word = ""): 
  if len(letters) == 0:
    if is_word(word):
      return [word]
    else: return []

  words = []
  for i in range(len(letters)):
    newletters = letters[0:i] + letters[i+1:]
    l = letters[i]       
    words += ramnaga(newletters, word + l)
  return words

this would help of course if wordpress didn’t mess up the indendation [edit: ahh sourcecode mode] [edit2: untested modification, watch out!].

implementation of is_word() is left as an exercise to the reader. You obviously need a wordlist (*nix users can look in /usr/share/dict/words. Windows users can scour the internet because their OS is rubbish and doesn’t come with anything useful. I recommend using the ‘set’ object to store it because it automatically removes duplicates and it provides O(1) membership testing, which is all you care about). If you remove the is_word condition, it’s just a standard permutation generator but it’ll use a tonne of memory (easily 350MiB on a 10 character word) storing gibberish before you can filter it.
On my CPU it takes about 25 seconds on a 10 character word and about 250 on 11 letters. It’s impractical above that. If you reimplented it in C you might squeeze a 12 letter word out, but there’s a limit to how fast you can get an inherently O(n!) algorithm. I’d be interested to know if you could use some kind of coding theory to figure out permutations based on hamming distances, or something. It seems then that your complexity would be in terms of the size of the dictionary?

call as follows:


>>> words = list(set( ramnaga("ramnaga") )) # list(set()) to uniqueify elements
# and convert back to a list
>>> words
['anagram']

written in about 4 minutes due to the extremely rare good luck of getting a recursive problem right first time.


Follow

Get every new post delivered to your Inbox.