Always New Mistakes

November 21, 2007

Powerlabs: An insight into Powerset’s technology

Filed under: Business, Natural Language Processing — Tags: , , , — Alex Barrera @ 7:42 pm

Finally I received my invitation for the Powerset’s Powerlabs website. I’ve been playing with it for a couple of weeks now and I’m quite impress with some of the things they’ve accomplished. Powerlabs is an invitation only community for beta testers, built around five demos (they just added a new one yesterday). Their main goal is to show Powerset’s technology (via the demos), and to discuss problems, questions or ideas related to either the technology or the web interface.

The web site has five main sections: Dashboard (like you home page), Demos, Discussions, Queries (wished-for queries by other members) and People (list and ranking of current members). You can basically break the website in two big sections, the demos and the discussion area (more on this later).

This section is the user’s homepage for Powerlabs. As you can see on the screenshot, here you can monitor your stats within the community (nº of discussions, nº of comments, global rank based on karma points, etc.), Powerlabs latest news (“New Sports Demo” right now), a list of recently implemented ideas (with a link to the post where the idea was made and the author) and your news feed. The news feed is probably one of the best parts of Powerlabs. It’s quite similar to the one you find in Facebook and it basically keeps you updated with the latest activities related to your user and you summited ideas.


This is one of the most important areas of the website. Here you can play with five demos that show you Powerset’s technology at work. I stress the word technology, because you won’t find a Natural Language Search demo here. So it’s not a demo of the product, it’s a demo of the algorithms they are using to build the product. The demos have two big restrictions, they use predefined queries (you are just able to fill some words of a longer phrase) and they only work with the Wikipedia corpus (hopefully it seems they are trying to expand the corpus in a very near future). The demos are divided in several categories: sports, the arts, business, quotes and PowerMouse. The first four are the same demo, the only difference is on the queries you can ask.

sport demo1

For example, for the sports demo you can ask some of the following questions:

  • What did X win?
  • What did X draft?
  • Who X (defeated or beat) X?

For the business demo, you can ask things like:

  • What does X own?
  • Who did X acquire?

sports demo 2

The PowerMouse demo is probably the most fun to play with, it lets “you examine how structured information is extracted from open text“. As they say, it’s not a search application per se, but it’s a window into how the results are obtained. When you start the demo you are asked to fill the following structure: Something (subject) – Connection (verb) – Something (object). There are no restrictions on what three words to use. This demo will give you all the possible combinations of your query. When it can’t find the exact query it will use broader words to try and get what you where looking for. Take a look at the screenshot for more insights.

PowerMouse Demo

This is the heart of the website. The discussion area is like a forum but on steroids. It’s divided in various categories (wikipedia, query examples, labs, the arts, …). You also have a category where you’ll see all the posts either order by date or by relevancy. Each post has it’s title, author, nº of views, votes and comments. It works in a similar way as Digg does. Members read a post and vote if the like the idea. The more number of votes an idea gets, the higher it gets on the relevancy list. For each post you can also set a flag that will allow you to follow the activity (you’ll get updates on your news feed). Each time a comment or an idea you’ve posted gets a vote, you’ll get a new entry on your news feed. Every time someone comments on either a comment you made or an idea you sent, you’ll get an entry on your news feed. There is even a RSS feed for the discussions, albeit the url is hidden in one of the posts. One of the coolest features is that, before post something, the system suggest similar already summited ideas. If by any chance someone has already posted a similar question or idea, you’ll know before you actually send yours, avoiding sending duplicated post.


Idea 1

Idea 2

This section lets you browse some of the member’s wished-for queries. It lets you create new queries and comment on the ones that are stored already.

The site spins around the idea of karma points. That’s similar to many other ranking/voting systems around like Slashdot, Digg or ycnews. You earn points for commenting, for using the demos, for posting an idea and for every vote your ideas or comments get. The People‘s area lets you monitor what members are on the system. You can order them by karma rank, recent activity, number of ideas, etc. In the future some demos will require a certain level of karma, so it’s always important to reach a good karma level.


My opinions? Well, I think Powerset has achieved a great goal, get lots of testers involved. It’s true that the demos disappoint, I think most people expect a less rigid demo, but hey, at least they are showing that the technology isn’t vaporware. I love the approach of letting people participate at all stages of the product development. Few companies do that and it’s a breath of fresh air. The interface and user experience of Powerlabs is awesome. One of the best I’ve seen so far. It’s easy, straightforward and useful. Importing the voting scheme from places like Digg is a very smart move, they’ve managed to engage a lot of users and that’s great. The downside, after playing for some weeks and due to the lack of more comprehensive questions and corpuses, you end up not knowing what more to do. In my opinion, it lacks three important things, an RSS feed for the news feed, that way you can keep updated instead of having to refresh the browser, a way to ask much more open questions and a bigger and updated corpus (I think this might be on its way). Nevertheless, they are moving fast and each week they are adding new features, so I’ll keep checking and I’ll update when necessary.

Any Powerseters willing to add some comments?

Image credits: Powerset

November 19, 2007

Breaking the Internet barrier

Filed under: Business — Tags: , , , — Alex Barrera @ 12:12 am

Some days ago I read this post and realized how much the media has changed in the last years. Not that I hadn’t noticed, but it suddenly struck me that we are changing our entertainment habits at an astonishing rate. How many of you still watch TV? I knHollywoodow for sure I don’t, and it’s been a while. It seems as if “offline” businesses are crumbling and letting space for the online businesses. What got me thinking was the idea of having online TV shows script writers displacing “offline” script writers. Is it a good idea? Will Hollywood bring unknown script writers from the Internet to fill in the writers on strike? On one side, this is a great idea. I’ve always thought that some worlds are way too endogamous, Hollywood is one of these. What is great about the Internet is that you don’t have to do expensive studies to see if something works, you just put it online and wait and see how users react to it. So now you don’t have to do castings, you just search the Internet and bring on board the writers of the best shows on the Internet, period. But, is this going to work? Even though the format is the same, the medium is quite different. That means that the audience is also different. So, if you bring good script writers to the TV, will they grab the same user share as on the Internet? I don’t think so.

Take a look at Fake Steve Jobs blog. I’m a great fan, I enjoy reading it. I think Daniel is an excellent writer, one of the best I’vefakestevejobs.jpg read in quite some time. His blog is followed by millions of readers. Recently he wrote a book titled Options: The Secret Life of Steve Jobs, a Parody (Haven’t read it yet but it’s on my Christmas list). I was amused by one of his latest posts: “Have you seen his book? It’s awful. I mean I’m a big fan of Colbert’s TV show and I know he hired a huge team of writers to work on the book for him but honestly, no kidding, this thing sucks ass. Nevertheless it’s a huge best-seller, while my own brilliant memoir … um, isn’t“. Let’s get some numbers, don’t we? Ok, FSJ’s book ranking in Amazon is, as of today, #1,894. Colbert’s book is #8. If we get back to our online world, according to Techmeme’s leaderboard, FSJ holds a quite nice #64 position on the world’s blog list (#53 if you look at Technorati). So, how can that be? If people follow FSJ’s blog on a daily basis, why don’t they all buy his book? Some might argue that if you read him online, you are going to read him offline, but in my opinion, we are quite different on our online/offline states. The same thing doesn’t *has* to work on both sides of the line. It might, but as we see, the numbers tell another story. In my case, I haven’t bought the book yet because I have few time. It’s faster to just read his blog (and many others). Might this be a common reason for other people?

Now, back to Hollywood and the Internet, it might work, some might work, but ultimately I do think the future is the other way round. That is, Hollywood script writers leaving the big studios to set their own Internet productions. Lets face it, currently it’s ridiculously cheap to produce an Internet show, just take a look at Scoble Show or Diggnation. I love them, but hey, they cost an infimun part of a TV show. So I wonder, why don’t the writers just make the leap and start writing their own shows? Why not take the path that their cousins at the music industry are taking? They could control their creative work and could make a hell lot more money. Times are changing, people like to watch their favorite shows on demand, not at a predefined hour like in the TV, so why don’t just produce shows exclusively for the Internet? Maybe I’m too futuristic about this, but looking back, I’m amazed at the speed things are changing (or it might be I’m getting old). If you don’t follow the people, you’ll be left behind.

As always, constructive critics, opinions and similar are welcome. What do people think about all this? Just for the record, I think the strike is something writers should have done much earlier. Keep up with it guys, and just make the final leap to the Internet, even though I’ll miss some shows! (this is the list of our favorite TV shows that are affected by the strike).

UPDATE: Seems like the Hollywood writers are really jumping to the startup arena.

Image credits: Craig Aurness/Corbis

November 15, 2007

Are social networks putting us in danger?

Filed under: Security — Tags: , , , , — Alex Barrera @ 2:51 pm

Today I would like to talk about an issue I’ve been ranting lately. Are social networks putting our privatePirate Flag and personal information in danger? I’ve been working for some time in the information security industry and I’ve seen many crazy things. Due to the recent popularity of social networks, we are beginning to see a shift on which information gets stolen. At first the bad guys targeted big company servers, nowadays exploiting remote bugs on current operative systems is getting much harder (thanks to things like ASLR, Non exec stacks, grsec, etc.). That’s why the bad guys are focusing on hacking browsers and their web applications. Each day we spend more and more time playing, working and using web applications, gradually incrementing the time we are exposed to them. Given the fact that the use of social networks is expanding at an incredible rate and that part of the experience consists in giving away our personal and private data, we have a ticking bomb on our hands.

So, we have motivation, we have interesting information to steal and best of all, we have a huge community of web developers who lack the security knowledge to code secure and reliable web applications. Don’t get me wrong, it’s not that the developers don’t care. First of all, they do care, but the don’t know what to look for, they don’t know how an exploit works and of course, the don’t have time to deal with it. It’s much more important to deal with scalability issues or with SEO strategies. The problem is that, due to the growing popularity of the social networks and things like Facebook apps or Open Social, these issues are acquiring an important weight. But you might think, why should I care? It’s not as if I’m exposing my credit card number, isn’t it? False, you are exposing a wealth of information much more important. We are who we are, our hobbies, our sports, our political views, our friends, etc. If someone can steal that information, we could be easily impersonated at all levels. From being victims of online scams, telephone scams, bank scams, to being denied a job due to some piece of information floating around. It can even cost you business deals or strategic partners.

Now the facts. It took theharmonyguy 45 minutes to find a way to hack the RockYou OpenSocial application emote. It took him 20 minutes to hack the iLike application on Ning. Today theharmonyguy announced that the Compare People application on Facebook leaks private information some information to the adSense network. Well, that is some scary stuff. Not only are we going to be data mined by Facebook, but we are also being targeted by adSense at the same time. Last, but not least, we have the great MySpace hack of Alicia Keys profile. The exploit was rather trivial, not highly sophisticated, but quite easy to avoid in most cases. Worst of all is that most social networks aren’t listening to security experts that are point out other hacks, scams or flaws in their systems. On the other side, it’s true that most application developers for social networks platforms are fast responders when a security flaw is found on their products. Why the actual social network cares less is beyond my understanding.

I don’t want to claim someone can eradicate all security bugs. They will always exists, for as long as we are humans. What I want to point out is that most of the bugs come from lazy developments. Right now, there is much more at stake than it was two years ago, so guys, pull out your security hats and lets hack some decent code, for the sake of all the “social networkers”.

UPDATE: As you can see, the guys from Compare People took a fast step and responded promtely to this issue. You can see their comments below. As far as I know, Facebook applications shouldn’t be using some of a user’s profile to feed adSense, or at least they should alert you about it. I hope this gets straight pretty soon. Thanks again to naval ravikant for the comments and the fast response.

UPDATE2: Venturebeat has a statement from a Google spokesman: “We recently allowed some application partners to send us additional keywords to improve ad performance. A limited number of the keywords sent to Google did not comply with the developer’s agreement with Facebook. When we realized this conflict, we asked the partners to discontinue sending those keywords. We are no longer using those keywords. No personally identifiable information was exchanged between Google and the application developers“. They do have a good point, is it going to take a blogger whistleblower to identify security breaches? Is it going to be like this with OpenSocial? Let’s hope no. At least they’ve answered pretty fast to the issue.

November 7, 2007

Some numbers on the Radiohead album

Filed under: Business — Tags: , , — Alex Barrera @ 3:01 pm

Yesterday, Comscore made a press release with some numbers on the Radiohead album experiment. The data is from the first 29 days of the experiment and is based on a sample of 2 million people. The percentage of people that payed for the album was 38% (worldwide), while the percentage of free downloads rose to 62%. This numbers leave behind the ones I posted on the bagels experiment, 62% of free downloads versus a 87% of free bagels. As I’ve said before, could this be due to Internet’s anonymous nature? I am beginning to think it has to do with a feeling of pre-visualization. People download the album for free, they play it for some days and if they like it, they buy it. So, it’s more of a quality-reward scheme. For me it’s like the shopping experience. You take several t-shirts, you first put them on, see how cute you are in them, and only if you look good, you’ll buy them.

Nevertheless I think Comscore’s numbers might be a little flawed. Most people I know have downloaded the album first, and after a while they’ve bought it. Because the sample only registers the first 29 days, it’s quite probable that some of the people’s downloads that are eventually counted in as free, would later become payed ones. This is specially true for the first period of any experiment, specially if there has been a great deal of fuzz around it. Right now I think the current rate of free downloads might be a little lower.

By the way, if you like the blog you can subscribe to it here.

UPDATE: As Mathew notes, Radiohead made a press release stating that comScore’s numbers are way innacuarate. Although they haven’t said what the real numbers are. I expect higher percentages of payed albums.
Image credit:

UPDATE2: As I suspected,  Thom Yorke said very recently: “In terms of digital income, we’ve made more money out of this record than out of all the other Radiohead albums put together, forever.

Facebook’s nextgen ad platform analysis

Filed under: Business — Tags: , , — Alex Barrera @ 2:21 am

Today, Facebook unveiled at New York their new ad platform. There is a great fuzz around this and hundreds of blogs are posting about it. That’s why I wont be talking about the actual system. For those interested in knowing how it works, I encourage you to read Owyang’s summary about it. Instead I’m going to try to analyze how, why and what can be done with the new system.

The social ad platform is structured around two ideas, brand awareness and friend’s trust. Some days ago I was discussing with a friend what this announcement really meant to Google. Would Google’s ads revenue be damaged by it? After reading today’s news I understand that Facebook is trying to build a brand awareness machine. This means that the objective for advertising in Facebook would be different from that of Google’s adSense network (based on purchase intentions). Now, the question is, which one will bring more revenues to advertisers? Generally speaking, it’s harder to trace the effectiveness of brand awareness ads than Google ads, so will the investment pay off for marketers?

To solve this itchy problem, they are offering what they call Facebook Insights. Here is where things get funny: “Facebook Insights gives access to data on activity, fan demographics, ad performance and trends that better equip marketers to improve custom content on Facebook and adjust ad targeting“. Ok, let’s analyze each one:

At first I though this was about ad performance, but it seems it’s different. Making a wild guess I can imagine they can track which pages you visit most (friends profiles, brands pages, groups pages, etc.). They might track your actual normal activity within your profile pages (page views, how much time you spend on each page, what sequence of pages you navigate more often, at which hours you are most active, in my case from 11am to 14pm for example, which parts of a page you give more attention to, etc.), so they know which are the best spots/time slots to feed you ads.

Fan demographics
Of course, data mining to the rescue. They are going to drill down the users profiles and retrieve all their information, including country, state, city or town, political views, relationship status, etc. Pretty scary isn’t it? This is something I’ve been ranting about for some time now. People aren’t really aware of the value of their personal information or the wealth of information they put on the Internet. But, most people that are screaming right now about this, should read the Facebook’s terms, as they clearly state that the information you pour into Facebook is theirs to use. I’m wondering what more interesting things they can retrieve from your profile. Lets see, which networks are you linked via your friends, which might give you which type of friends you normally interact with. In my case, most of my friends are from Berkeley, so you can infer I get along quite well with people from Berkeley, or I’m interested in Berkeley. My posted items can also be analyzed to see which items are the ones I like most. Of course, the likeness application is a gold mine. They can extract (I suppose with consent from likeness developers) which friends are “more like me” and easily target them, or vice versa. The same can be applied to the Wall application (no consent needed here as it’s from Facebook).

Ad performance
How are they going to track this? I assume they’ll pull all the hits either on a banner or the user’s news feed. Standard procedure here. Interesting to see which one gets a better hit ratio. Intuition tells us that the news feed will be the winner, but intuition isn’t always right. We’ll have to see some numbers. I haven’t seen any indications yet of price differences for banners and news feed ads, I’m assuming here they’ll probably be different.

I speculate they’ll show some nice graphs where you can see how the campaign is going. Brand tracking might fall under this category. Zuckerberg said on the press release you would be able to track your brand through Facebook’s public forums. I wonder if this would extend in a future to personal walls or even inboxes. That’s a scary thought, even though the marketer won’t be able to track which wall or inbox the buzz came from.


Some wild guesses on the outcome of this new ad system. I think it really hits on a sweet spot, but as some people have already said, it’s going to depend on implementation and the way the roll it out. For example, they are creating a new niche for application developers that want to target business and brand profiles. I wonder if the interaction between Facebook members and business pages will make websites like go away. I have some doubts about the viral spam spreading. Facebook has been clear on their privacy policy about this new features: “Facebook users will only see Social Ads to the extent their friends are sharing information with them. […] In keeping with Facebook’s philosophy of user control, Facebook Beacon provides advanced privacy controls so Facebook users can decide whether to distribute specific actions from participating sites with their friends“. Now my question is, will I be able to change my news feed preferences to limit or filter spam noise? I currently have around 88 friends on Facebook, not too much, so I might bare with the noise, but what will Robert Scoble do with his 5000 friends (myself included)? Maybe he’ll finally thank Facebook for setting the 5000 friends cap. Another question that comes to my mind is, will marketers be able to control the text that gets injected into someone’s friends news feed? That could be very interesting, as personalized messages or specially crafted texts can make a big difference in marketing.

All comments are welcomed, I want to know what people think about the future possibilities of the system, or even if they are thinking about using it. New ideas for more data mining on my Facebook profile?

UPDATE: I received the first spam message from Robert Scoble. He created a brand page for himself so you can join and be a fan of Scoble. It’s going to be very interesting to see how all this develops.

Image Credits:,,

November 5, 2007

Powerset’s internal problems

Filed under: Business — Tags: , , — Alex Barrera @ 3:15 pm

is a San Francisco based startup that is trying to build the next generation search engine. Co-founded by Barney Pell, Steve Newcomb and Lorenzo Thione in late 2005, they have already raised $12.5 million in a series A investment round. Powerset is attempting to develop a public and global semantic search engine. Current search engines like Google are based on keywords. You need to type the right keywords so you can find what you are looking for. Even though this approach works great (just take a look at GOOG’s soaring stock), with today’s information rivers, we need smarter ways to find what we are looking for. For many, semantic searches are the next logical step. Instead of searching for keywords, you ask, in plain English, what is you are looking for, just in the same way you would ask a human.

I don’t want to jump into any conclusion, building a search engine is a really tough job, but applying AI and natural language processing algorithms to it, is even harder. I’ve been there and I know it well. So I understand why it’s taking so long for them to release a product to the public. But creating the fuzz Powerset did, and not delivering a product in a year’s time is a tough call. And if that wasn’t enough ammo for good critics, the recent stepping down of their CEO and the departure of one of the cofounders doesn’t adds very well.

Let’s analyze the situation in detail. Why would Barney Pell step down as CEO? As he exposed in his blog: “After extensive thought and reflection, the Board and management team decided that the time was right for us to bring in a new CEO to take the company to the next level and for me to transition into the role of CTO“. Well, why would that be? After all, Powerset doesn’t has a public product yet (not until 2Q of 2008), isn’t making any revenue, isn’t getting ready (AFAIK) for a new round of investment and much less for an IPO. What next level is that then? Most rumors point out at pressures from investors which are getting nervous. I don’t have all the facts, and as such, I won’t jump into conclusions, either way, I do think it’s a very unwise move for a startup that is against the ropes in terms of credibility.

To make matters worse, Steve Newcomb, one of the co-founders is leaving the company. It’s funny how all other related posts have only focus in Pell’s stepping down, instead of the departure of a co-founder. I personally think this is much more relevant of what the inside situation “might” be. It’s quite strange that, as Barney puts it: “Steve lead the company internally and brought strengths in execution on several other fronts“, but nevertheless he’s being “expelled” from the company. At least that’s the image that’s being projected. It isn’t usual that the co-founder and leader defects the company before they have a product. I’m quite sure the work isn’t finished yet and that there are more reasons for his departure. As I’ve said before, I’m only speculating on this as I don’t have all the facts. Still waiting for Steve to post something on his blog.

For me, not only the external image of the company is being damaged by this management change, but worst than this is the fact that people inside the company might be suffering from this restructuring. I would love to hear opinions from Powerset engineers and what are their views in all this.

Just for the record, I’ve been following Powerset for some time now. I think they have a brilliant technology and very good strategic partners. I don’t think they should go to the deadpool (yet), as they are still to show their technology during 2008. As I’ve said, it’s a tough field and I’m confident they’ll produce some nice technology in a near future. Again, inside views of the matter are greatly appreciated.


Image credit:

November 4, 2007

Radiohead and their bagels

Filed under: Business — Tags: , , , — Alex Barrera @ 9:52 pm

Hello everybody!

Finally I’ve decided to start my own blog, so if it’s your first time here I welcome you to my personal blog. Today I was watching episode 122 of Diggnation when Alex Albrecht talked about Radiohead’s newest album commerce scheme. For those of you that haven’t heard of it, Radiohead has ditched the music industry and is selling their latest album, “In Rainbows” on their website. The cool part is that if you want to download it, you are asked how much you want to pay for it (including $0 or free as in beer). Alex raised a very good question, why should you pay for the album? Why should people pay for it if they have the option of getting it for free? He gives an example of a friend that downloaded it for free the first time and after listening to it, he went and payed for it. When asked, he was fast to answer (as well as Kevin Rose did in the same situation), that although he had downloaded it for free, he was going to pay it.

So now you might be thinking, what the hell has this to do with bagels? Well, here it is. Currently I’m reading Freakonomics from Steven D. Levitt and there is a story about an economist called Paul Feldman and his bagel business. Mr. Feldman’s business model was quite unusual: “[…] he would deliver some bagels and cash a cash basket to a company’s snack room; he would return before lunch to pick up the money and the leftovers. It was an honor-system commerce scheme, and it worked. […] “. Pretty amazing by it self, but the interesting point is that as an economist, he was able to analyze the percentage of customers that payed and the ones that stole from him. His conclusions where pretty staggering, the overall rate of paying customers was around 87% by the summer of 2001 and went 2% higher after 9/11. So, with these numbers at hand, we can say that humans are, in general, honest, which might go against what intuition tells us.

Nevertheless, I’ve been wondering if such a high rate was because of fear of being accused of theft by coworkers or because of innate human honesty. Now, back to Radiohead, Alex’s question reminded me of the bagel business. Do people pay for it because they are honest of because of another reason? I would love to see Radiohead’s numbers on their experiment. Would they yield the same rates as of Mr. Feldman or by contrary be far lower due to the anonymous nature of the Internet? Humans can do terrible things if they know no one is watching, so, do people pay because of fear of what friends might think about them or because they are really honest and value the album? It’s interesting that prior to downloading the album they ask you for all your personal details, including your country and zip code. So, not only are they getting free marketing, but they are also harvesting a pile of very valuable data. It could even be sold to other artists/bands willing to follow their path (like Madonna). Here you go, another way of making revenue with this experiment.

What do people think about this?

Blog at