Log in

View Full Version : MobiPocket eNews Creator Banned from Thoughts Media Server


Jason Dunn
05-21-2006, 04:35 AM
It's with a great sense of frustration that tonight I implemented a robots.txt ban of the Mobipocket eNews Creator bot. If that doesn't work I'll be implementing an IP address block. It's been a rough week for our server - we've had a lot of traffic and the server has been slow and unstable because of it. I dug into the logs to see where the traffic was coming from, and I was shocked to see huge amounts of traffic coming from a robot identifying itself as "eNews Creator". How huge? On Wednesday the 17th of this month, the eNews Creator bot hit Pocket PC Thoughts.com <b>37,499</b> times alone. Half a gig of bandwidth in one day. The entire month of April? Our server was hit 938,472 times, for a total of 18.55 GB of bandwidth. To put those numbers into perspective, the Google bot only hit us 431,010 times in the same month and only used 3.29 GB of bandwidth.<br /><br />I don't take this action lightly, because there are evidently people interested in using the Mobipocket client to read news from Pocket PC Thoughts. The problem is that Mobipocket has evidently created their software without regard for the servers that they are scraping. They created it to hit servers without respecting the rights of the publisher. <b>No RSS/scraper client needs to hit a server 37 thousand times in one day, period</b>. <br /><br />Upon discovering this information, I attempted to contact Mobipocket to resolve the matter. I scoured their Web site looking for some way to contact them directly - nothing. No email addresses, no contact form - they direct everything into the forums. I managed to find one email address on their privacy page, but upon emailing it I received an autoresponder stating that they did not monitor the alias and only responded to posts in the forums. :evil: I tried webmater@ and postmaster@ - two email aliases that should <i>always</i> work. Neither did - both bounced back because the aliases do not exist. I sent two private messages to Mobipocket employees in the forums on Thursday, asking them to respond to this issue, and 48 hours later I haven't received a response. I tried sending a private message to the forum admin - they've configured the admin account to refuse all private messages. :roll:<br /><br />If you're a Pocket PC Thoughts reader that uses the Mobipocket client, I'd encourage you to contact them and point them to this posting. I'd like to find a way to resolve this. In the meantime, use <a href="http://tinyurl.com/k4sd5">NewsBreak</a> and subscribe to our <a href="http://www.pocketpcthoughts.com/xml/">RSS feed</a>.

code-frog
05-21-2006, 04:53 AM
Maybe this will be helpful I doubt it based upon what you've expressed.

http://www.networksolutions.com


Registrant:
mobipocket
104 av. President Kennedy
PARIS, Ile de France 75016
FR

Domain Name: MOBIPOCKET.COM

Administrative Contact:
brethes, thierry [email protected]
Mobipocket.com
104 av. President Kennedy
PARIS, Ile de France 75016
FR
+33 1 44 14 15 52 fax: +33 1 44 14 15 56

Technical Contact:
Network Solutions, LLC. [email protected]
13200 Woodland Park Drive
Herndon, VA 20171-3025
US
1-888-642-9675 fax: 571-434-4620

Record expires on 01-Feb-2010.
Record created on 01-Feb-2000.
Database last updated on 20-May-2006 23:51:14 EDT.

Domain servers in listed order:

NS81.WORLDNIC.COM 205.178.190.41
NS82.WORLDNIC.COM 205.178.189.41

Registry Status: REGISTRAR-LOCK
Registry Status: clientTransferProhibited
Registry Status: clientDeleteProhibited
Registry Status: clientUpdateProhibited

Domain Name: MOBIPOCKET.COM
Registrar: NETWORK SOLUTIONS, LLC.
Whois Server: whois.networksolutions.com
Referral URL: http://www.networksolutions.com
Name Server: NS82.WORLDNIC.COM
Name Server: NS81.WORLDNIC.COM
Status: REGISTRAR-LOCK
EPP Status: clientTransferProhibited
EPP Status: clientDeleteProhibited
EPP Status: clientUpdateProhibited
Updated Date: 07-Jan-2004
Creation Date: 01-Feb-2000
Expiration Date: 01-Feb-2010

>>> Last update of whois database: Sat, 20 May 2006 23:51:44 EDT &lt;&lt;&lt;

Jason Dunn
05-21-2006, 04:57 AM
Maybe this will be helpful I doubt it based upon what you've expressed.

Thanks, I'll give it a try - message sent. Though given the way they reject all other incoming email, I'd be surprised if they check email on that alias.

code-frog
05-21-2006, 05:00 AM
Theoretically they have to. If you don't receive email on your domain registered email regularly your DNS provider can make changes that will break all of your stuff and you won't know because you never received the email they sent two weeks before the change.

I noticed a France origin on all the contact info so you might even get a response tonight if someone is at least polling on that address for hits.

The next step would be to register a complaint with Network Solutions as they are listed as the technical contact. While Network Solutions is dreadfully slow by email/web a phone call can work wonders.

If you want/need additional help on this let me know. I'm quite happy to do this stuff as it's ... well ... something I do a lot of. :)

- Rex

Jason Dunn
05-21-2006, 05:12 AM
The next step would be to register a complaint with Network Solutions as they are listed as the technical contact. While Network Solutions is dreadfully slow by email/web a phone call can work wonders.

Nah, no need for that - we can just block them at the firewall level and that's that. I have this suspicion that anyone who would design a software tool like this would also design it to ignore robots.txt, so I may have to IP ban them.

code-frog
05-21-2006, 05:16 AM
Banishing the IP would be the most direct route. I think they would be getting some complaints on that maneuver. :)

Well, I'm definitely curious to see if they are a manned operation checking those addresses if they aren't that puts them in the "suspicious" category as a web based company with web based products. Wonder if Mark Rossovich has had any contact with them. :D

Jon Westfall
05-21-2006, 05:44 AM
This software strikes me as the work of a lazy programmer more than an irresponsible company (of course, lazy programmers probably work at irresponsible companies). It would never be in a company's best interest to destroy the servers of the content providers they hope to bring to their customers. Seems to me the more likely thing is that the software was just written poorly (Ok, VERY poorly) and now that the company realizes that, they've fortified themselves by directing everything to useless forums. My guess is that we aren't the first site to have this happen, and that they've probably honed their skills of avoidance.

Therefore, I think blocking them at our level is best, and moving on. I realize some people want to help us out, but I don't think Jason is ready to take it beyond a block, and I don't think it really makes much sense.

Of course, I may be wrong, and Jason may feel especially vengeful tonight :twisted:

lapchinj
05-21-2006, 07:18 AM
This software strikes me as the work of a lazy programmer...

Therefore, I think blocking them at our level is best, and moving on. I realize some people want to help us out, but I don't think Jason is ready to take it beyond a block, and I don't think it really makes much sense.

Of course, I may be wrong, and Jason may feel especially vengeful tonight :twisted:
Sounds like a quick and dirty web scraper that someone threw together (a non-programmer type) and never took the time to check out if it was working correctly (or really didn't care). Hitting a server 37k time in a day is kind of sleazy to put it nicely. Bandwidth cost $$$$ and it's with the kindness of the host to let some bot scrape it's pages not some idiots right to do it without regard for damages done by it.

My own 2cents says just block the IP and move on. By the fact that there is no way to contact him/her/them or their offices (it they even exist) just goes to show you that the admin of MobiPockets is just an irresponsible person that doesn't care much about anything that goes on outside their own perimeter.

Jeff-

Janak Parekh
05-21-2006, 07:19 AM
The irony is... I really like Mobipocket as an ebook reader; in fact, it's my default ebook reader. However, their desktop tools do seem a little less polished/mature. :(

--janak

dommasters
05-21-2006, 09:08 AM
I'd have your blood pressure checked after that experience Jason. You couldn't make it up 8O

francks
05-21-2006, 09:11 AM
FYI, Mobipocket is now owned by Amazon. You could try that route. In any case, the enews scraper feature is now abandonned by Mobipocket. That was a nice feature which never quite took off. With RSS now so prevalent and web services from both Microsoft and Google to mobilize web pages, the need for the scraper is much less than before.

The next version of the eNews creator only supports RSS, so it's a matter of time until the problem goes away... Until then your solution is probably the best one. If you support full text RSS no user would ever need to use the screen scraper anyway.

frankenbike
05-21-2006, 09:28 AM
Hitting about every 2.5 seconds? That really is some dumbass programming.

Menneisyys
05-21-2006, 01:36 PM
As the author of HTTP filter &amp; Mobipocket Web Companion Support Pack (http://www.winmobiletech.com/mwc), I know the inner secrets of the Web Companion, the communication and the way it collects news quite well.

(Note that I don't know any 4.8+ versions of Web Companion. They may have switched protocol in the meantime. I don't even know if they have introduced a central server-based solution instead. When I, some 3 years ago, worked on the Support Pack, they denied any kind of central cache server because of the EU laws that don't allow for any kind of content caching.)

Indeed Web Companion may be a real pain in the back as it downloads new content quite often. This is why for example The Register banned Web Companion some 37-38 months ago.

There is a solution, though. Just ask users that would still want to stick to stick to Web Companion to collect news to manually disable Web Companion and only manually sync to minimize the impact. Of course, it'll still be impossible to distuingish between automatized and manual download. To help this, you could implement some kind of login - that is, MWC-based eNews download could be account-based. MWC is capable of this - even without my additional tools - see the POSTLogin section in the User Manual (http://www.winmobiletech.com/mwc/#_Toc40600671) of my tools. And, if a particular user takes too much bandwidth (because she or he doesn't disable the automatic news download), just ban his account.

RSS is indeed nice, but, especially when used with the right additional tools (for example, my pack), an advanced user can achieve FAR more with Web Companion than with anywhere else - downloading, filtering forum pages at once for offline reading and as he or she wants, for example. (That's only one of the additional features I've added to Web Companion.)

Menneisyys
05-21-2006, 01:46 PM
Hitting about every 2.5 seconds? That really is some dumbass programming.

Not really - the bot runs (at least in older version it ran - dunno if the new version is centralized. Because of the stupid EU laws, I don't think so) on users' desktop PC. That is, if say 5000 users add a site to their MWC and synchronize content, say, 5 times a day, it'll be 25000 hit total.

Menneisyys
05-21-2006, 01:50 PM
If you support full text RSS no user would ever need to use the screen scraper anyway.

Yup, if full RSS is offered (or a well-done RSS client that is also able to collect linked pages), RSS can be pretty good (particularly because there's no HTML layout markup overhead). In a lot of (advanced) cases, however, Web Companion offers unique capabilities.

(Not that I would use the latter any more - GPRS/EDGE is cheap/fast enough to be able to get news almost always online.)

Jason Dunn
05-21-2006, 04:03 PM
If you support full text RSS no user would ever need to use the screen scraper anyway.

And if I support full text RSS, no user would have a reason to come to the site. There's no revenue model in full text RSS feeds on mobile devices.

Jason Dunn
05-21-2006, 04:15 PM
There is a solution, though. Just ask users that would still want to stick to stick to Web Companion to collect news to manually disable Web Companion and only manually sync to minimize the impact.

No solution is viable if it relies on people voluntarily doing something because I ask them to.

Since you know so much about this, do know you if their eNews Client respects a robots.txt ban? I sure hope it does, because if all this traffic is coming from 5000 different MobiPocket users running the scraper on their desktop PC, it will be impossible to do an IP address block.

I've now posted in their forum:
http://www.mobipocket.com/forum/viewtopic.php?t=1724

Jason Dunn
05-21-2006, 04:18 PM
I banned it yesterday evening, but in looking at a log file analysis of the first few hours after midnight, the eNews Client is still hitting our server. :evil: So it seems they do not respect a robots.txt ban. I'm not surprised, considering how little respect the have for content publishers. So the question is, how do I stop them? Do you know how The Register stopped them? Do they have a blacklist of URLs that their client will not scrape that I can get myself added to?

Menneisyys
05-21-2006, 04:45 PM
I banned it yesterday evening, but in looking at a log file analysis of the first few hours after midnight, the eNews Client is still hitting our server. :evil: So it seems they do not respect a robots.txt ban. I'm not surprised, considering how little respect the have for content publishers. So the question is, how do I stop them? Do you know how The Register stopped them?

They didn't make an IP ban - just haven't returned anything (except the Forbidden HTTP status code) when they sensed the eNews Creator HTTP User Agent. That is, everything (all IP's) passed to the server; of them, only ones that weren't using the eNews Creator-specific User Agent were (are) served with actual content.

Menneisyys
05-21-2006, 04:53 PM
Since you know so much about this, do know you if their eNews Client respects a robots.txt ban? I sure hope it does, because if all this traffic is coming from 5000 different MobiPocket users running the scraper on their desktop PC, it will be impossible to do an IP address block.

Dunno if the structure of the communication changed in the last three years (I've stopped working on the MWC Support Pack almost exactly three years ago when I completely switched to online content access) - I don't think much has been changed because last time I installed the latest Mobi, it installed almost exactly the same client on my desktop PC as 3-4-year-old versions.

These clients directly connect to the subscribed Web site, download entire HTML (not RSS or anything more machine-friendly) pages and parse the useful content out of them.

Most sites (see the example of The Register) defend themselves against Mobi clients by returning the bandwidth-friendly simple Forbidden header wen they encounter the User-Agent.

User-Agents, fortunately, can't be set in MWC - at least this was the case 3 years ago. that is, casual MWC/Mobi users don't have a chance at making MWC collect articles from sites that ban them. It's only using proxy servers that offer transparent User-Agent spoofing (which sits between MWC and the Web server and uses a standard desktop IE header to identify itself to the Web server, completely overriding the User-Agent sent out by MWC) that anyone can download anything from a protected Web site.

Menneisyys
05-21-2006, 04:58 PM
No solution is viable if it relies on people voluntarily doing something because I ask them to.

You can force them - MWC supports cookies / cookie-based authentication and, therefore, you can always know who is downloading content with MWC and who with IE. Then, you can easily filter out who hasn't switched to manual, bandwidth-friendly mode. Also, you can make MWC-based access account/password-based only - that is, say, unavailable for non-subscribers.

(Let me know if you need more info on the network model / the built-in capabilities of the latest MWC version and I check them to see if anything has changed in the last 3 years, networking-wise.)

FallN
05-21-2006, 07:44 PM
I've personally never liked Mobipocket products. They always seemed junky and unpolished to me. I would ban their IPs and be done with it. Save yourself and the servers the headache.

manywhere
05-21-2006, 09:03 PM
I banned it yesterday evening, but in looking at a log file analysis of the first few hours after midnight, the eNews Client is still hitting our server. :evil: So it seems they do not respect a robots.txt ban. I'm not surprised, considering how little respect the have for content publishers. So the question is, how do I stop them? Do you know how The Register stopped them?

They didn't make an IP ban - just haven't returned anything (except the Forbidden HTTP status code) when they sensed the eNews Creator HTTP User Agent. That is, everything (all IP's) passed to the server; of them, only ones that weren't using the eNews Creator-specific User Agent were (are) served with actual content.

Yup. I think they "issued the block" by configuring the Htaccess file to return a 403 error code (i.e. forbidden) when the mobipocket user agent is sent in the http request header. It's quite simple and used extensively, so a simple Google search on it will return lots of results. You could also put a custom message to be shown, if that feels like the right thing to do.

You're right in doing this Jason. I'm really surprised to see such bad coding, let alone ignoring the use of robots.txt and the "is this page updated" http request method, has been released into the public. Bad, bad developer(s). :twak:

BevHoward
05-22-2006, 08:17 AM
Any company's decision to make it difficult to impossible to easily contact them on any matter is an important clue to how they view their customers and is one of the factors I use when making purchase decisions _before_ the purchase.

MobiPocket, despite offering a good lit generator, failed this criteria years ago, so, this event does not surprise.

Good luck persuing it... if they do respond, be sure to email them the url of this thread ;-)

Jason Dunn
05-23-2006, 06:38 PM
I've traded a few private messages with Mobipocket, but the bulk of the conversation has been in this thread:

http://www.mobipocket.com/forum/viewtopic.php?p=6018#6018

Basically their attitude is that it's not their problem my site is so popular. :?

I find it fascinating that not one person that uses their tool has popped into this thread to comment.

Menneisyys
05-23-2006, 07:01 PM
I've traded a few private messages with Mobipocket, but the bulk of the conversation has been in this thread:

http://www.mobipocket.com/forum/viewtopic.php?p=6018#6018

Basically their attitude is that it's not their problem my site is so popular. :?



Well, it's indeed pretty hard do control independent scraper clients that don't rely on any central server persmission (or any TTL scheme) from a centralized location. That is, eNews creator users can freely add any site they want to replicate onto their configuration - noone can control this from anywhere else. The only way to stop the excessive downloads is selective prohibition.

The easiest and cleanest way is just implementing a User-Agent-based "ban" (that is, just returning a lightweight "Forbidden" message, putting almost no additional burden on the Web server) as the Register has done years ago when they banned out eNews Creator.

Jason Dunn
05-23-2006, 07:46 PM
Well, it's indeed pretty hard do control independent scraper clients that don't rely on any central server persmission (or any TTL scheme) from a centralized location. That is, eNews creator users can freely add any site they want to replicate onto their configuration - noone can control this from anywhere else. The only way to stop the excessive downloads is selective prohibition.

The idea of scraper clients are pretty hostile to Web servers, but if Mobipocket had created it with some logical limits, I think it would have been ok. Things such as not hitting a server more than once every 60 minutes automatically, limiting the total number of pages scraped to 20 or something reasonable...my understanding is that they don't limit what the user does, and let's face it, most people don't think about server resources or what kind of damage they might be doing, they just click a few buttons and get what they want. A responsible software developer would build in some reasonable limits to balance the needs of the customer with that of the publisher.

Menneisyys
05-23-2006, 09:03 PM
Well, it's indeed pretty hard do control independent scraper clients that don't rely on any central server persmission (or any TTL scheme) from a centralized location. That is, eNews creator users can freely add any site they want to replicate onto their configuration - noone can control this from anywhere else. The only way to stop the excessive downloads is selective prohibition.

The idea of scraper clients are pretty hostile to Web servers, but if Mobipocket had created it with some logical limits, I think it would have been ok. Things such as not hitting a server more than once every 60 minutes automatically, limiting the total number of pages scraped to 20 or something reasonable...my understanding is that they don't limit what the user does, and let's face it, most people don't think about server resources or what kind of damage they might be doing, they just click a few buttons and get what they want. A responsible software developer would build in some reasonable limits to balance the needs of the customer with that of the publisher.

Now I understand.

From the programmer's point of view, MWC is a very simple app. It lacks almost every advanced for example content filtering, selective link-following capabilities. (This is why I've written a complete, additional filter proxy to augment its capabilities.) This is why it also lacks any kind of constraints - except for a very simple 5-minute timeout.

As the Mobi folks seem to completely drop HTML-based eNews from the next version of MWC, I don't think they'll ever implement this. That is, just not serving MWC clients (based on their User-Agent HTTP header, which, if they don't use filtering proxies - which the casual user will never bother ton install and set up -, isn't spoofable/changable).

Jason Dunn
05-23-2006, 11:24 PM
Oh, and by the way, we've now fully blocked their client. Anyone hitting the site with the eNews Creator client gets a message saying why it has been blocked, and the URL to my post here.

GoldKey
05-24-2006, 05:47 PM
Basically their attitude is that it's not their problem my site is so popular. :?

Technically, they've built a tool that depending how it was used could in essence be used as part of a DOS attack. Since it would be very easy to inadvertently set it up to be way to aggressive, you'd think they would be a little more concerned.

uhoo
05-26-2006, 08:17 AM
Well, I used to use the enews creator but I've stopped using the current version because it's a bandwidth hog. The previous versions were pretty simple, you could start a sync manually or schedule a daily sync. The current version seems to sync in a loop, ie. as soon as it finishes a sync, it starts all over again. I'm on a limited internet plan, and with some feeds like the BBC taking up 20mb every sync, it adds up quickly. It opens multiple connections, so it manages to slow even my 24mpbs connection to a crawl. As soon as i realized what was happening I stopped using it. I've no doubt that if they continue, they'll soon have no users.

Jason Dunn
05-26-2006, 04:30 PM
The current version seems to sync in a loop, ie. as soon as it finishes a sync, it starts all over again. I'm on a limited internet plan, and with some feeds like the BBC taking up 20mb every sync, it adds up quickly. It opens multiple connections, so it manages to slow even my 24mpbs connection to a crawl.

You know, that might explain the insane number of hits we were seeing... 8O

mobi_aurelien
05-26-2006, 05:21 PM
Well,

I am very surprised by some figures announced in this forum. There might have been a problem with one of our old version of the eNews Creator. The point is we are totally switching from the old style of eNews that is still used by many people to a fully-compliant RSS standard eNews system.

So , we are really sorry for the issues you all seem to have faced and I hope you will try to make you an idea of the new eNews system when we'll release it :oops: (still in public beta today) )...

BTW , Great forum :D

Aurélien

Miz
05-26-2006, 08:27 PM
This reminds me of the avantgo issue you had some time back. Isn't this pretty much the same?

Btw, it's kinda surprising to see that many people didn't upgrade away from Mobipocket, because around version 4.0 (as old as a year or more) the news creator was horribly inefficient. It was very slow and takes a lot of bandwidth. I don't know how the new version is because I stopped using it some time ago.