TalkTalk Virus Alerts Scanning Engine

I was fiddling around today and I noticed that as soon as I visit a website, a bot with the user agent “(TalkTalk Virus Alerts Scanning Engine)” will visit the site*. This is incredibly dubious on privacy grounds and I am not happy! There are two things: it means my ISP is snooping my HTTP requests, which isn’t cool at all, and secondly, I don’t think it’s cool to actually visit those URLs either. They were private in the sense that they aren’t advertised anywhere and they only exist for short periods of time. It’s not hard to imagine a case where a supposedly one-time script is executed twice and wreaks havoc on a server because an admin wrongly supposed that only he would execute it. To some extent that is his own fault, but I would also argue that if the URL isn’t linked publicly and only exists for a few seconds then it’s none of anyone else’s business, and it shouldn’t be an completely unreasonable assumption in that case that nobody else would know about the URL.

I don’t believe that an ISP has the right to visit a URL simply because one of their customers is visiting it. It comes down to authorised and unauthorised access to a computer system, the latter is illegal under the computer misuse act. Just because an ISP’s customer has authorised access doesn’t mean that the ISP does as well. My ISP executed all of my testing scripts, and I’d definitely argue that that was unauthorised.

The sad thing is we are not even on TalkTalk… we’re on an ISP which was taken over by an ISP which is owned by TalkTalk.

Although, not for much longer.

I definitely recommend blocking the bot on principle because at the very least it’s a waste of your bandwidth, but it’s not trivial. I’ve seen it on two IPs (same subnet, but you can’t really block the subnet because they might use it for customers as well) and it also uses wget (a unix command line downloader) to download files (it seems to detect this via the file extension in the URL).

Update

25/6/11: Previously the bot was honestly identifying itself uniquely via its user agent. I blocked the user agent. TalkTalk has now changed it so it identifies itself as Internet Explorer 8. This is plainly dishonest, it is almost certainly not using IE8 to download your page so it’s an intentional attempt to deceive servers and webmasters. I have had *a lot* of hits for this page via Google recently so I assume that a lot of webmasters had begun blocking it and this is why TalkTalk has made it harder to identify.

IMO this gives the whole thing some legal standing (IANAL etc). As we know, unauthorised access to a computer system is illegal under the computer misuse act. Blocking by user agent tells TalkTalk their bot is not welcome. TalkTalk now takes measures to work around the blocks, and gain access to systems to which they had specifically been denied permission.

Blocking the bot via .htaccess (apache/litespeed)

Place the following code in your root .htaccess

order allow,deny
deny from 62.24.181.134
deny from 62.24.181.135
deny from 62.24.222.131
deny from 62.24.222.132
deny from 62.24.252.133
deny from 80.40.134.103
deny from 80.40.134.104
deny from 80.40.134.120
allow from all

RewriteCond %{HTTP_USER_AGENT} TalkTalk\ Virus\ Alerts [NC]
RewriteRule .* - [F]

Thanks to Jill and Anonymous in the comments for the IP list.

If you observe other IPs acting similarly, please post them in the comments. Also, if you can improve my .htaccess, or give instructions for other HTTP servers, please do.

____
* how do I know this? it was my site, I can see the logs

Advertisements

I like blogging

Posted in Uncategorized
51 comments on “TalkTalk Virus Alerts Scanning Engine
  1. Binx says:

    Very interesting post, i came across this after googling the TalkTalk user agent you mentioned.

    When you say ‘as soon as I visit a website…’, am I right in thinking that TalkTalk is either modifiing the request or performing it on your behalf before returing you the result.

    Or are they simply re-trying the request after you have completed it?

    I am worried that this is going to cause duplicate data on our site which we use to bill our clients. However it seems as though we cannot identifty the request as coming from a bot because it could actually be the initial user request as we would with GoogleBot for example. Whereas if it was always going to be a duplicate request we could just ignore the action.

    Any extra explanation on your findings would be appreciated.

    • laeknishendr says:

      As far as I can tell, if a TalkTalk customer requests a page, it goes through as normal but TalkTalk will schedule their own bot to retrieve the page as well, which seems to happen a minute or so afterwards. So if you get traffic from a TalkTalk customer you’ll see their IP request your page, then shortly afterwards you’ll see the ‘TalkTalk Virus Alerts’ bot (on a different IP) request the same page. So they aren’t modifying or proxying it or anything like that, they are just duplicating the request, along, presumably, with any GET parameters.

      I don’t understand exactly what you are trying to do, but it should be possible to distinguish the bot vs a legitimate user by the user agent string.

      If you want to block the bot entirely, I’m using this in my .htaccess to return a 403 forbidden:

      RewriteCond %{HTTP_USER_AGENT} TalkTalk\ Virus\ Alerts [NC]
      RewriteRule .* – [F]

      but as I mentioned in the post, they also use wget (user agent: “Wget/1.9+cvs-stable (Red Hat modified)”) to request files, which is harder to block without false positives.

      I have seen the bot using the following IPs, but this may be on the same network as customers, so you have to be a bit careful about using it for identification:

      80.40.134.103
      80.40.134.104

  2. Victoria says:

    I found that user agent a month or so ago in my logs which I (sadly) check every day. Added it straight away to the ban list of browser summaries in my block plugin. Hasn’t stopped the thing visiting my site though, just means it cannot see the content.

    Thanks for the info though as I was wondering if I should unblock it; I think I’ll leave it blocked.

  3. James says:

    How can I block this user agent on IIS v6? I am hosting with Go Daddy and noticed this “TalkTalk Virus Alerts Scanning Engine” in my logs today as well. First time it has appeared.

  4. jrfoleyjr says:

    I just found them in my apache access.log today for the first time. They were pulling a copy of my robots.txt

    I see many web site operators in a high lather over being visited but what is actually scanned or indexed?

  5. […] I couldn’t find anything on the net about it until laeknishendr posted an interesting article https://laeknishendr.wordpress.com/2011/05/15/talktalk-virus-alerts-scanning-engine/. This particular browser summary appears to occur every time someone from the TalkTalk group of […]

  6. N says:

    I have found that the TalkTalk Alerts Virus Scanning Engine bot visits my sites but only requests the files which it believes could potentially present a danger to its’ customers – primarily javascript files.

    However, as a matter of principal, since I believe this is not how TalkTalk should be operating, I have now blocked the bot and, just in case the bot name changes, their IP’s.

  7. N says:

    Incidentally, they also use IP’s of the range:
    62.24.252.133

    As well as those posted above:
    80.40.134.103

  8. Steve says:

    According to both the press, their website and the configuration settings within “My TalkTalk” account – Home Safe is disabled by default.
    However this does not actually appear to be the case, looking at my webserver logs it shows that the “TalkTalk Virus Alerts Scanning Engine” visited the page less than a minute after I had browsed the page myself. Given that I had just created this page, it was empty and was not linked to from any other source it would indicate that TalkTalk is tracking my browsing.

    XX.XX.XXX.XXX – – [03/Jun/2011:12:49:54 -0700] “GET /email.indigo-solutions.eu/test.html HTTP/1.1” 200 26 “-” “Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; en-gb) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1”
    62.24.181.135 – – [03/Jun/2011:12:50:26 -0700] “GET /email.indigo-solutions.eu/test.html HTTP/1.0” 200 6 “http://email.indigo-solutions.eu/test.html” “(TalkTalk Virus Alerts Scanning Engine)”

    I’ve reported this to TalkTalk and I’m awaiting a response. Not impressed to say the least.

    • laeknishendr says:

      The impression I got was that home safe is merely the mechanism which *informs* you whether or not a site you visit is considered harmful. They still visit/track/scan the pages you do regardless of home safe.

  9. gill says:

    The current observed IP adresses used by the TalkTalk bot are:

    62.24.181.134
    62.24.181.135
    62.24.222.131
    62.24.222.132
    62.24.252.133
    80.40.134.103
    80.40.134.104

    More info and discussion:

    http://www.the-phoenix-broadband-advice-community.co.uk/index.php/topic,1828.1260.html

    https://nodpi.org/forum/index.php/board,20.0.html

    • B. Smith says:

      Just had my first visit about 4 hours ago and Googled “TalkTalk Virus Alerts” which got me to this post. I thought I would add to your list of IPs for “stalkstalk”, as I noticed some refer.

      80.40.134.120

      Also note: You can specify denied IP adresses in the following formats: (for use in cPanel via the IP Deny Manager if you have access to this function.)

      80.40.134.120
      Single IP Address

      80.40.134.103-80.40.134.120
      Range

      80.40.134.120-199
      Implied Range

      80.40.134.120/132
      CIDR Format

      80.
      Implies 10.*.*.*

  10. ali says:

    we closley moniter logs for security reasons , having noticed alot of these get requests from agents called “TalkTalk Virus”. Alarm bells started ringing.

    after a quick google , it would seem its a harmless enough bot that tends to pay perticular interest in your sites .js files , i gues , the check the javascript isn’t attemting to do somthing nasty.

    Privacy ? well , yes thats a big bad no-no and should be an opt-in service , not an opt-out service.

    RE: Steve

    From my logs i can tell you no matter how my customers used the checkout system , the Talk Talk bot did not request any https:// pages

    therfor if you have private scripts , activated by you hitting a URL ? pop them on a https:// , even if its a self-singed certificate.

    the other thing you can do , is disallow ‘wget’ as a get type in your webserver, in my apache2 this was just a .htacess quick fix. though , there’s not much reason to ever allow Wget.

    In a way i wish this was also a ‘opt-in’ setting , not somthing every webmaster needs to disable.

    IP confirmed @
    logs/access.log: 62.24.252.133 “(TalkTalk Virus Alerts Scanning Engine)”

    • stephanie says:

      I have to disagree with blocking wget; it’s just a command line downloader and I use it frequently. I suppose if you are not hosting any *files* (rather just pages) then maybe it’s okay to block it, but I often prefer using it than my browser if a download will take any amount of time.

  11. Defyall says:

    Thanks for the post, I too have block the useragent from accessing my site.
    If it helps anyone, here’s how I’ve done it in nginx

    if ($http_user_agent ~ “TalkTalk Virus Alerts”) {
    return 403;
    }

  12. stephanie says:

    That’s certainly a lot cleaner than htaccess!

  13. AJ says:

    Thanks for the post!!!

    We just noticed (TalkTalk+Virus+Alerts+Scanning+Engine)@62.24.181.134 in our server log and were left wondering why it was there? And here it is explained, in this post :-)

    We are rather dumbfounded though as to why it is so obvious? Surley a Trojan distributer would filter out the IP range? If I was T.T. I would alias the bot as “AppleWebKit/534”, or use a cycle of user agents. But hey, we’re not paid that much.

    Maybe return a 405 ‘method not allowed’?

    A.J

    • stephanie says:

      Well, let’s be honest, the main reason the service exists is to sell more internet connections. Whether it works effectively is probably a secondary concern. As others have pointed out elsewhere, it scans the site *after* you’ve already visited it, so currently it’s broken by design whether or not they do it overtly.

      I don’t really understand what 405 is for, to be honest.

  14. Scottish says:

    lol stephanie , good point , i just re-checked the times for logs your right , the Talk Talk bot , visits 40 or so seconds after the customers initial hit. and yet , lets look at the Talk Talk ISP page ….

    “Virus Alerts – Helps stop viruses before they reach your front door and alerts you if you visit a suspected site”

    ..”Exclusively available for TalkTalk customers free of charge, HomeSafe is built into our network and protects every device using your TalkTalk broadband. It’s simple to set up, there’s nothing to download or update and it won’t slow down your internet connection or computer…”

    “To make sure we’ve really understood what families today want, HomeSafe has been developed in partnership with our panel of parents and online safety experts.”

    The idea that an ISP can protect its users , by visiting these pages , after the user has already done so, should be obvious to any internet consumer , not just Tech. people…

  15. Adam says:

    I just noticed this in my 404 reports earlier, I was really confused why anything would be trying to read the url’s that it seems to think are there.

    Anyways, glad to know they’re not looking at https files.

    I’m glad im not with talktalk, I don’t understand how visiting sites after letting the

  16. Reece says:

    Hey, thanks for the post. I was looking over my reports and noticed this agent aswel. I don’t intend blocking it right now, as it hasn’t committed any offences on my site that I am aware of. But I definately agree with your comment on “principle” .. it has no right to do what it is doing.

    Glad to be reading this entry, I know atleast I am not alone with this agent. Will probably be saving the code you gave too, as I assume there may come a time I will want to use it.

    Thanks agaiin =]

  17. Nick says:

    The damn bot just tried to use place an order on my site by using the previously submitted info urls.

    Time to ban the bugger.

  18. Nick says:

    I’ll dig through and find it.

    This is the error email I received when it happened.

    URL: https://www.###.co.uk/Securetrading/standard/redirect/
    IP Address: 62.24.222.131
    Time: 2011-06-21 21:44:55 GMT
    Error:
    Invalid target currency.

    • gill says:

      Many thanks for that.
      Did you notice if there were any other bots that also attempted to reply that same url after that?

  19. Nick says:

    62.24.222.131 – – [21/Jun/2011:22:44:55 +0100] “GET /Securetrading/standard/redirect/ HTTP/1.0” 503 3875 “/Securetrading/standard/redirect/” “(TalkTalk Virus Alerts Scanning Engine)”

    Time in email is an hour behind, because its still set to send in GMT.

    No other bot followed to that url after that point. Just the ridiculous Twenga bot stealing bandwidth.

  20. Nick says:

    Just an FYI, 62.24.222.131 has hit the site 405 times already.

  21. Reece says:

    Okay, 2 days after posting my last comment, it’s being a pest. Time to block this bot. Thanks for the code, you helped me out :)

    • stephanie says:

      well Reece, I’m sorry to disappoint you but the instructions I gave will no longer work as the TT bot has changed its user-agent. But new ones are up :)

  22. For those who are blocking on the User-agent, it is now pretending to me something else. Most recent occurrences of this thing following me around on my own sites have this UA string:

    Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)

    I wonder if it’s related to my requesting a MAC code and citing this bot as the reason?

    Incidentally, I don’t know why it bothers with robots.txt as it appears to ignore it.

  23. Avis says:

    TalkTalkI P 80.40.134.120 has changed its Agent to “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)”

    Courtesy of H at Phoenix BAC

  24. Under Cover Mole says:

    Hi I have been wondering what has been going on for the last few days. I’m a talk talk customer and I have noticed that every time I visited my own blog I generated two page views. One straight away and the second a few seconds later. I really started to think I was being stalked and spent ages virus checking my computers on my home network.

    On my network I have a server accessible from the outside world and in the end using a proxy server I discovered that when I visited my server the so called talk talk bot followed me. How I found this page was from searching for the following browser string:

    (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2

    I just want to know how I can stop them following me around. I feel it is a invasion of my privacy. I feel stalked and I wonder if I should take it up with the police.

  25. Viola_Vard says:

    china sex club how to not be a lesbian housewife sex movie gallery milf crusier 6 hmong sex 3d images xxx stereo vision work hentai can not cum while having sex nicole graves blowjob vid touch me porn how to start a bdsm business gay sex club guide porn sites sample best sex toy site brutal blowjob movie hand job xxx vidio gay sex swimming story frree amature porn blowjob university horny sister sex

  26. Gloria_bori says:

    free hot comics sex sex offender of the day forced sex story images free public piss movie maggage that turns into blowjob porn video free sample download free foot porn video wife creampie home video sex offenders list new york action milf student porn thumbnail search engine interracial sex bamboo franzi sex sex fetish quiz test free very old women porn pics ventura sex only dating wife watching husband having sex bound teen sex hentai nympho great milf movies

  27. Anonymous says:

    just thought id add, it is also coming in from 80.40.134.120 using cloaked user agent, so add that to your block list ;)

  28. scotlandshop says:

    Update on previous post..
    i didn’t decide to block this agent by IP or agent string, as generally no harm is being caused by the bot hitting up my site.
    Though i have just noticed another variety this time by McAfee with almost the exact same behaviour some details here > http://iluvltd.co.uk/2012/05/daldcoutxb1-mcafee-com-hitting-my-site/

  29. Paul says:

    Constant scanning of my website over the past week, all from IP 62.24.252.133
    This is an IP address owned by Opal Telecom (but I think is just a subsidiary of TalkTalk).

    Since most of my site requires a user to be logged in, the bot immediately tries to access the same page a user has accessed, but since the bot is not logged in, it just gets a login prompt page instead of the real content.

    However, it is effectively doubling the traffic to my site from TalkTalk users.
    This practice of scanning the visits of their PAYING customers is completely despicable. Note that the bot used to identify itself honestly, and respect robots.txt.

    Now it seems it masquerades as Internet Explorer in an attempt to avoid detection, and no longer respects robots.txt telling it not to scan. There is no way to identify this bot other than by IP address (and it has many)

    I thought we got rid of all these scummy practices when all the adware/malware companies got sued into oblivion in the 90’s

    • me says:

      Yes, Opal is part of TalkTalk. I think it’s some kind of business broadband but we ended up on Opal by being with another ISP (Nildram) that got taken over.

  30. Kevin Varley says:

    Still active, same old IP’s, pretending to be a ‘real’ user (faked UA)

    62.24.181.134
    62.24.181.135
    62.24.222.131
    62.24.222.132
    62.24.252.133
    80.40.134.103
    80.40.134.104

  31. Thanks for this post, I can confirm it’s still active and following me about.

    I was writing a stat gathering program and could see it scan any new page I visited on my site. I assume as it only seems to bother about new pages and not bother about ones I visited again it’s scanning to add pages to it’s database as clean or not.

    Not sure if I should block it in .htaccess or put it into the ignore list of the stats program.

    Creepy behavior though.

  32. Aaron says:

    I’m getting this too.

    Here are the offending lines:

    62.24.252.133 – – [13/May/2013:13:01:00 -0400] “GET /robots.txt HTTP/1.0” 200 274 “MY_URL/robots.txt” “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)”
    62.24.252.133 – – [13/May/2013:13:01:02 -0400] “GET /index.html HTTP/1.0” 200 25152 “MY_URL/index.html” “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)”
    62.24.252.133 – – [13/May/2013:13:01:04 -0400] “GET /index.html HTTP/1.0” 200 25152 “MY_URL/index.html” “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)”
    62.24.252.133 – – [13/May/2013:13:01:04 -0400] “GET /MY_REAL_PAGE.html HTTP/1.0” 200 4176 “MY_REAL_URL/MY_REAL_PAGE.html” “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)”

    It’s clearly faking its user agent. Accessing multiple pages at the same time? A normal user agent reading robots.txt? Dodgy referrer links. And why on earth is it using HTTP/1.0?

    It’s a bit odd as it seems to appear from no-where. Above the lines above I have only one thing, 30 seconds earlier, from a renown UK spammer (http://www.stopforumspam.com/ipcheck/80.47.202.30).

    80.47.202.30 – – [13/May/2013:13:00:33 -0400] “GET /MY_REAL_PAGE.html HTTP/1.1” 200 1895 “MY_URL” “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1”

    I’m not sure if this is related.

    Are talktalk “running” this bot?

  33. Jane says:

    I run affiliate ecommerce sites and this really screws my clicks up. i mean when i check sites / pages i am getting false visits from these ass holes! sad they cant get their shit right! and innocent people suffer with security or stats / EPC etc.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: