Sunday, December 10, 2006

ByIndia and Web2 Corp, perpetrators of copyright violations

It's one thing when an individual copies content from other sites and presents it as their own.

It's quite another when a company listed on the NASDAQ indulges in copyright violation.

The company is Web2 Corporation (NASDAQ OTCB: WBTO), which was recently acquired by perpetrators, ByIndia.

India is internationally recognized for producing the best programmers in the world. However, no Indian programming corporation has ever taken on the monumental project of developing an Indian specific search engine.


Oh! How I cringe when I see what the best Indian programmers are churning out. Site scraping and RSS scraping bots. Shame! More shame to Web2 Corp, a NASDAQ company.

Web2 Corp (NASDAQ OTCB: WBTO) Web2 Corporation is an Internet technology company focused on improving the ways people and businesses utilize the power of the Internet. The firm specializes in rapid adaptation of technologies to address new markets of users by reducing the level of technical skills required, simplifying useful products, and lowering prices.


Here's a list of some blogs copied by ByIndia. You might see an error message when you click through to the byIndia blogs. If you do, then edit the URL in your address field to remove the blogger's name so that the URL is http://blogs.byindia.com and visit the site. Then type in the blogger's name and visit the site again to see the blog.
OriginalCopied Content on ByIndia
Indian Food Rocks http://blogs.byindia.com/dhivya
Sailu's Food http://blogs.byindia.com/bala
Kamla Bhatt http://blogs.byindia.com/cale
Gaurav Sabnis' Vantage Point http://blogs.byindia.com/anitha
VKN's My Dhaba http://blogs.byindia.com/muthu
Hindu Mommy http://blogs.byindia.com/badu
The Great Indian Mutiny http://blogs.byindia.com/raji
Polite Indian http://blogs.byindia.com/rajee
Wafter http://blogs.byindia.com/sencier20
Maine Line http://blogs.byindia.com/abia
Dilip D'Souza's Death Ends Fun http://blogs.byindia.com/mahar
Coconut Generation http://blogs.byindia.com/calla
Watching India http://blogs.byindia.com/guru20
Reflectionshttp://blogs.byindia.com/agila
India, Ink.http://blogs.byindia.com/vidya
Reality Check http://blogs.byindia.com/pujit


...and many more. Most of these copycat blogs have been created recently in November 2006.

This takes me back to the much-hyped launch of InstaBlogs in October 2005, who also had copied content from various sources. They claimed they moved quickly to recttify the situation; I haven't been back to that network so I don't know if they did or not.

I cringe even more when I think about the common thread in these copyright violations. Indian. Indian blog networks. Indian bloggers. Indian web sites.

If your blog is mentioned above and you are indeed blogging on ByIndia with another nickname, my apologies to you. Please leave me a comment and I will take it off.

11 Comments:

At 12:22 PM, December 11, 2006, Anonymous Anonymous said...

Hey

Thanks so much for informing me about this

I was totally unaware.

Guess there is not much we can do about it, or is there?

 
At 9:28 AM, December 12, 2006, Anonymous Manisha said...

Hindu Mommy, I wrote to Web2 Corp about this and I got a canned response from the Bill Mobley, CEO. He said that they know it was a malicious bot / spider / spyder that was auto-creating these blogs in order to sabotage byIndia. He went on to say that the problem had been fixed. But it took ages for them to take the content down. The blogs are still there but show with a message that the blog has been banned due to inappropriate content.

In another email, Bill Mobley said that the content providers for our blogs need to protect our content better. I have never heard anything like this before! It is the nature of the web. Anything that can be read by a browser can be accessed by any user-agent. RSS feeds are part and parcel of how content is read today. I started to believe that he is clueless. And it was confirmed when he said that they can't tell the difference between a genuine blog (their platform has been live since 2005) and an auto-created blog! Therefore they rely on complaints such as mine to find these copyright violations. I am simply aghast! This, from a CEO of a company that is projecting byIndia as the next Baidu! They know a bot created it. How difficult is it to track the footprints of this bot and delete every account it created. Unless...their app is not what they claim it is!

Currently the blogs are still there, the content is still with them (check out the titles of past posts) but it is not displayed.

What can we do about it? There are several things.
- Check out Dealing with Plagiarism. It is a great FAQ and guideline.
- Inform other bloggers or web site owners when you see a copyright violations. Unless we work together on this, it's not going ot go away soon.
- Bear in mind that if it is an individual, they may not be aware that copying without permission and/or attribution is illegal. Always inform first and as politely as you can.

 
At 1:10 PM, December 12, 2006, Anonymous Polite Indian said...

Thanks for letting us know about this. I had no idea people are doing this.

I will go through your FAQ and see what I can do to check this.

thanks a lot.

 
At 7:05 AM, December 13, 2006, Anonymous John Sebastian said...

Geez, give these guys a break.

For one thing, you are complaining about the possible technical ineptitude of a CEO of a public company. A CEO is not in the data center. A CEO does not write code. It is not his job, or position to do ANYTHING remotely related to code on machines. That is the job of the programmers and department heads underneath him.

They took action, they removed the blogs. You say it took forever, but things like that take time. You also say that every blog platform out there can auto detect bot created blogs, and that is patently false. HUGE websites, like craigslist, rely on user feedback to remove/edit posts. You guys provided that feedback, and they took action. End of story.

 
At 12:12 PM, December 13, 2006, Anonymous Trevor said...

Manisha,

Let me express my apologies that you were unhappy with the manner the automated site scraping that you are complaining about on this site was handled. We have been attacked by a malicious coder’s spyder and, as soon as we could, have fixed the situation. The entire turn around time between your first complaint arriving in our mailboxes and having all of the sites that you've mentioned on this list removed was less than a week.

I know that in terms of internet time, that may seem like a very long time indeed, but especially given the time differences between America and India and the ensuing complications that always adds to communications, we have handled this as quickly as possible. The blogs no longer contain any content, although the blog’s address is still taken. The blog without content has been left up to prevent your site from getting spidered again.

This attack of automated site scraping on our blog service was clearly manufactured to damage our reputation with the Indian internet community. I am sorry that it clearly has done so at least in this case. We at Web2Corp still feel that ByIndia.com represents the best search engine and social network for Indians and by Indians and welcome you or any of your friends to try it out whenever you feel like it.

 
At 12:28 PM, December 13, 2006, Anonymous Anonymous said...

Manisha:

Thanks for finding out about this and all the resources you offered. They are definitely useful

Regards,
Hindu Mommy

 
At 8:38 PM, December 13, 2006, Anonymous Anonymous said...

Hi, You have done a wonderful job of alerting people about it. Apparently, ByIndia people also have noticed your post, and have 'banned' those blogs (whatever 'banning' means).

Thanks for the great job! Keep it up.

 
At 5:23 PM, December 14, 2006, Anonymous shilpa said...

Manisha, this is indeed a great job. Some people do it unknowingly and some people continue doing it even after they know this is wrong. I am very happy to see this blog where we can atleast create an awareness among people. Thanks to you :).

 
At 5:27 PM, December 14, 2006, Anonymous Manisha said...

My apologies for not posting the comments earlier. The Gmail account for this blog was one of the many experiencing "server error" problems yesterday.

John Sebastian, I agree with you which is why I was surprised to receive a reply to my email from the CEO. And then, a second email in response to mine, was also from him or rather from his email address which said that his system was not compromised but that the systems that host the blogs that were copied needed to find a better way to protect content.

A basic requirement for any web application today, especially one poised to be India's answer to everything on the net, is to ensure that user accounts cannot be auto-created.

Trevor, I was very happy to see Web2 Corp's first reply to my email. However, a search showed that several others had complained many days before I was made aware of this and had received an identical reply. Also, the blogs were still up there. A cursory look by others shows that there are still many blogs that you need to take down.

Don't get me wrong, I totally appreciate you taking the time to post an apology here. That has a cost attached to it, too - of that I am well aware. What didn't sit right is instead of taking responsibility for what happened, Web2 Corp decided to point fingers at the service providers who host our blogs.

Hindu mommy, Abi, Polite Indian, thank you for your kind words. I have been running into plagiarism of both content and images a great deal in the last few months. I recently had my copyright statement copied. The plagiarist was so inept that she forgot to change the underlying URL to the hyperlink so the copyright statement brought visitors right back to my site in a rather ironic twist!

 
At 6:43 AM, January 11, 2007, Anonymous Trevor said...

Sorry I’ve taken so long to get back to you, Manisha. I wanted to wait until I had some definite news.

I know that our CEO pointing fingers at the web service providers irked you. He misunderstood what our software team had told him when he asked how this had happened and then the email that he sent to you was misinformed. There is, of course, no way for content providers to protect their blogs from spidering and re-posting by a bot. We're aware of this and are currently looking for a thorough, automated solution to the problem.

We are currently developing a technology that will notify us if blogs are copying material, but this doesn’t really solve the problem. Many bloggers copy press releases on topic matters that interest them, with or without crediting them properly. This has become an accepted practice in the blogging community, but also means that there are a large number of false positives that a technology that looks for duplication of blog postings will find. Also, with the rise of services such as PayPerPost, it is not uncommon for bloggers to copy *themselves* as a way to earn more money.

In short, simply identifying blogs that have copied content does not mean that we are locating only scraped blogs, which also means that we have to dedicate someone to manually filtering the results we get to make sure that we aren’t suspending accounts without cause. Since we’re still a small company with big dreams, we haven’t been able to justify making a job position for just that kind of work. Once we’re big like Google or Yahoo, we may be able to afford that, but right now we have to rely on the understanding and the help of bloggers who find that their own material has been copied on our blogs to inform us of what has happened.

I’m sorry that content that you created has been copied. Please let the site administrator know at the address posted on ByIndia.com if this occurs again to you or any other blogger that you know and we will address it as soon as we can.

Thank you for your patience and understanding,

Trevor Longino
Communications Director, Web2Corp.

 
At 10:52 PM, January 11, 2007, Anonymous Manisha said...

Trevor, thank you for following up on this and setting the record straight. I really appreciate it.

There is no doubt that there are several challenges ahead for ByIndia. I wish you every success.

I do have a request: While you are working on setting up the manual filters to ensure that copyrights are not violated, could you do something about these irritants?

 

Post a Comment

<< Home