Sency releases several new features!

January 25th, 2010 by Charles S. Knight
Posted in CEO Views, News, Updates | No Comments »

2010-01-25_0832

Sency has recently upgraded its site with the following new features:

Sharing – Sency now makes it extremely simple to share both real time results and real time links that are found on its site – you will see the share links under each result and link on Sency.com.  You can share this content on your Facebook and Twitter page – and you can also easily email the information as well.

Link Mentions – When you visit Today’s Most Popular links for a particular search term – you can see how many times this link was mentioned.  You can now click on the blue mentions box to see the past 20 users whom have mentioned this link on the real time web.  This can give you an idea as to whom is sharing the links which are popular on the real time web right now.

All links posted by a respective user – When you get to a user page – which lists information and a picture about a respective user – you will see a link to see all links posted by this user.  This will show you the past 20 links posted by this user on the real time web.

The Top 100 –  Sency has published a The Top 100 page which shows you 100 things going on over the past few days.  While the Sency homepage will show you 10 topics going on lately, this new page goes much deeper to give you a wide range of topics which have been talked about recently on the real time web.

Ex: RT @zephomega The dot over the i is called a tittle. #uselessFacts

Sency will continue to add more features, as it is becoming a value able resource for internet users looking to find out what’s going on right now.

Evan Britton
Founder, Sency
Sency – What’s Going On?

Please see this related video from Sency – editor.

The six degrees of distribution in search.

January 19th, 2010 by Guest Author
Posted in CEO Views, Guest Authors, Realtime | 1 Comment »

Or, everything you ever wanted to know about Peer-to-Peer (p2p) distributed / decentralized search, but were afraid to ask.

faroo I’ve known Wolf and Gosia Garbe from FAROO for a long time, and so I asked Wold if he would write a guest post explaining the differences between single index Google type search architecture and the decentralized / distributed peer-to-peer model that FAROO is based on.

What follows is a pretty long essay, but it is meant to be. If you want to be up on what challenges Google is going to face in the coming decade, you really must allocate some time to read this article.

A guest post by Wolf Garbe, FAROO

Crisis reveals character, and this is especially true for distributed systems. Everything beyond the standard case may led to a crisis if not considered beforehand.

six_degrees

The network wide scale adds a new dimension to everything, completely changing the perspective and puting many centralized approaches into question.

Joining peers, updates and recovery look different from a bird’s eye view than from the ground:

*When the network size grows the bootstrapping algorithm needs to scale.

*Even if the whole system fails, and all peers want to reconnect at the same time the system should be able to recover gracefully.

*Every system needs to evolve over time, hence software distribution is required to work on large scale, perhaps frequently or immediately.

That’s why it is important to look at the scaling of all operational aspects, not only at the main search functionality. The weakest element defines the overall scalability and reliability of a system.

The benefits of a distributed architecture (as low cost, high availability and autonomy) can be fully used only, if the operational side is also fully distributed.

There should nowhere be a centralized element, which can fail, be attacked or blocked as a single point of failure or which does simply not scale. Not for crawling, not for indexing & search, not for ranking and discovery, not for bootstrap, and not for update.

Let’s have a closer look at those six degrees of distribution:

Distributed Crawling

Sometimes only the crawler is distributed, while the index and the search engine are still centralized. An example is the Grub distributed crawler, once used by Wikia Search of Wikipedia Founder Jimmy Wales.

A distributed crawler itself provides only limited benefit. Transferring the crawled pages back to a central server doesn’t save much of the bandwidth, compared to what the server would need to download the pages itself. Additionally there is overhead for reliably distributing the workload across the unreliable crawler elements.

The benefits of such hybrid approach are rather for applications beyond a search engine: if only selected information are transferred back (like scraping email-addresses), and the spider is harder to detect and block for the webmaster, as the load comes from different ip’s.

Distributed crawling will live up to its promises only as part of fully distributed search engine architecture. Where the crawlers are not controlled by a single instance, but crawling autonomous led solely by wisdom of crowd of its users. Huge network wide effects can be achieved by utilizing the geographic or contextual proximity between distributed index and crawler parts.

With FAROO’s user powered crawling pages which are changing often (e.g. news) are also re-indexed more often. So the FAROO users implicitly control the distributed crawler in a way that frequently changing pages are kept fresh in the distributed index, while preventing unnecessary traffic on rather static pages.

Distributed Discovery

Even for big incumbents in the search engine markets, it is impossible to crawl the whole web (100 billion pages?) within minutes, to discover new content timely (billion pages per day). Only if the crawler is selectively directed to the new created pages, the web scale real time search becomes feasible and efficient, instead looking for the needle in the hay stack.

By aggregating and analyzing all visited web pages of our users for discovery, we utilize the “wisdom of crowds”. Our users are our scouts. They bring in their collective intelligence and turn the crawler there where new pages emerge. In addition to instantly indexing all visited web pages our active, community directed crawler is also deriving its crawler start points from discovered pages.

Beyond real time search this is also important to discover and crawl blind spots in the web. Those blind spots are formed by web pages, which are not connected to the rest of the web. Thus they can’t be found just by traversing links.

Distributed discovery also helps indexing the deep web (sometimes also referred to as hidden web). It consists of web pages that are created solely on demand from a database, if a user searches for a specific product or service. But because there are no incoming links from the web, those pages can’t be discovered and crawled by normal search engines, although they start to work on alternate ways to index the hidden web, which is much bigger than the visible web.

Distributed Index & Search

Storing web scale information is not so much of a problem. Expensive are the huge data centers required for answering millions of queries in parallel. The resulting costs of billion dollars can be ommitted can be omitted only with a fully decentralized search engine like FAROO.

Incumbents already envision 10 million servers. A distributed index scales naturally, as more users are also providing the additional infrastructure required for their queries. It also benefits from the increase of hardware ressources, doubling every two years according to Moore’s Law.

Recycling unused computer resources is also much more sustainable than building new giant data centers, which consume more energy than a whole city.

The indexes of all big search engines are distributed across hundreds of thousands computers, within huge data centers. But by distributing the search index to the same to the edge of the network where already both user and content reside, the data have not anymore to travel forth and back to a central search instance, which is consequently eliminated. This prevents not only a single point of failure, but also combines the index distribution across multiple computers with leveraging the geographic proximity normally achieved by spreading multiple data centers across the globe.

Last but not least a distributed index is the only architecture where privacy is system inherent, as opposite to the policy based approaches of centralized search engines where the privacy policy might be subject to changes.

Zooming in from the macroscopic view, every distributed layer has its own challenges again. E.g. for the index peers usually do not behave like they should: They are overloaded, there is user activity, the resource quota is exhausted, they are behind a NAT, their dynamic IP has changed or they just quit.

Those challenges have been perfectly summarized in “The Eight Fallacies of Distributed Computing”. Yet going into all the details and our solutions would certainly go beyond the scope of this post.

Distributed Ranking

A additional benefit is a distributed attention based ranking, utilizing the wisdom of crowds. Monitoring the browsing habits of the users and aggregating those “implicit” votes across the whole web promises a more democratic and timely ranking (important for real time search).

While most real time search engines are using an explicit voting, we showed in our blog post “The limits of tweet based web search” that implicit voting by analyzing visited web pages is much more effective (by two orders of magnitude!).

This also eliminates shortcomings of a Wikipedia like approach where content is contributed in a highly distributed way, but the audit is still centralized. Implicit voting automatically involves everybody in a truly democratic ranking. The groups of adjudicators and users become identical, therefore pulling together for optimum results.

Distributed Bootstrap

The first time when a new peer want to connect to the p2p network, it has to contact to known peers (super peers, root peers, bootstrap peers, rendezvous peers) to learn about the addresses of the other peers. This is called bootstrap process.

The addresses of the known peers are either shipped in a list together with the client software or they are loaded dynamically from web caches.

The new peers then store the addresses of the peers they learned from the super peer. The second time the new peers can directly connect to those addresses, without contacting the super peer first.

But if a peer has been offline for some time, most of the addresses he stored become invalid because they are dynamic IP addresses. If the peer fails to connect to the p2p network using the stored addresses, he starts again the bootstrap process using the super peers.

Scaling
During a strong network growth many peers are accessing to the super peers in order to connect to the p2p network. Then the super peer becomes the bottleneck and a single point of failure in an otherwise fully decentralized system. If super peers become overloaded, no new peers can join the system, which prevents a further network growth.

Recovery
If the whole p2p network breaks down due to a web wide incident and all peers try to reconnect at the same time this leads to a extreme load on the super peers.
This would prevent a fast recovery, as peers would fail to connect but keep tying and causing additional load.
Those problems have been experienced in practice in the Skype network.

Security
Another issue is that the super peers make the whole p2p network vulnerable because of their centralized nature. Both blocking and observing of the whole p2p network become possible just by blocking/observing the few super peer nodes.

FAROO is using a fully distributed bootstrap algorithm, which

*eliminates the super peers as last centralized element, as bottleneck and single point of failure in an otherwise distributed system.

*provides an organic scaling also for the bootstrap procedure.

*ensures a fast recovery in case of a system wide incident.

*makes the p2p network immune to the blocking or monitoring of super peers.

Distributed Update

The distributed system becomes automatically smarter just by the increasing relevance of the collected attention data.
But you may want to refine the underlying algorithms, to improve the efficiency of the p2p overlay, to extend the data model, or to add new functions. And the example of Windows shows that it might be necessary to apply security patches, network wide, frequently and immediately. Updating p2p clients requires a very efficient software update distribution.

10 million peers and 5 Mbyte client software size would require to distribute 50 Terabytes for a full network update. Even for a 100 Mbit/s network connection a central update would last 50 days, if you manage to evenly distribute updates over time.

FAROO is using a distributed, cell division like update instead, where all peers pass on the DNA of a new version to each other within minutes. Of course there is some signature stuff to ensure the integrity of the network.

Divide and Conquer

By consequently distributing every function we ensured a true scalability of the whole system, while eliminating every single point of failure. Our peers are not outposts of a centralized system, but rather part of distributed Cyborg (combining the power of users, algorithms & resources) living in the net.

This is a system which works on a quiet sunny day, but also on a stormy one. It would be even suitable to extreme mobile scenarios, where peers are scattered across a battlefield or carried by a rescue team.

The system recovers autonomously from a disaster, even if there is no working central instance left, the surviving peers find itself, forming a powerful working distributed system again once they awake. If you have seen Terminator reassembling after run over by a truck you get the idea

In biology organisms naturally deal with the rise and falls of its cells, which as simple elements form superior systems. We believe that evolution works in search too, and that the future belongs to multicellulars

Relevance ranking in the social media age.

January 15th, 2010 by Charles S. Knight
Posted in CEO Views, Guest Authors | No Comments »

mike-leeBy Michael Lee, CEO of Thoora

Did you know that there are currently over 133,000,000 Blogs?

Its a big number. Now add to that traditional media and journalism sources, and the endless stream of content on Twitter and it is hard not to notice that the media landscape has changed and continues to evolve. Good or bad, this is a reality that makes online search in this space quite important as demand and expectation of news delivery migrates more to an online model. It also makes it apparent that the need for proper filtering and search of media has never been so critical.

thooraAt Thoora.com, we believe the key to delivering relevancy in media search comes down to two aspects: aggregation and ranking algorithms.

First, aggregation. Most people have chosen to solve the information overload by visiting a handful of preferred sites, be it a newspaper’s website or a blogging site that hand-selects a few chosen bloggers and posts each day. While this approach certainly addresses the problem of content overload, it also limits exposure to new voices and viewpoints. In some cases, content on these sites are selected by a small group of editors and readers would need to visit multiple websites to see the full dimension of an individual story.

Aggregation and clustering of stories across social and traditional media is key. People want to see a fact-checked article from the AP, a well-researched piece from the Los Angeles Times, a brilliant and insightful post from a blogger, and maybe even the latest real-time reactions from Twitter. In other words, people are looking for the complete picture to each story without having to visit multiple places to watch the story develop.

When we’re talking about a large pool of content, with the goal of giving each voice an equal opportunity to be heard, ranking is critical. Cutting-edge filtering and ranking algorithms can completely automate the process, so there’s no human bias or editorial selection involved. Additionally, these algorithms can factor in the ‘wisdom of crowds’, by considering people (and all of us, not just a select few).

For example, algorithms should look at the activity (or ’signals’) surrounding each story which articles and posts are generating the most comments, the most tweets, the most sharing and forwarding. These are the hot stories of the day and should appear up top in the rankings. Some search engines are starting to look at votes and links from social bookmarking sites, and while voting and like/dislike are certainly good ways to incorporate the human element into determining relevancy and value; unfortunately we know that voting can often be swayed by successful PR efforts.

At Thoora, we remove the risk of human bias by looking at implicit actions, as opposed to votes or link analysis. We incorporate people’s every day actions — such as comments, tweets, and forwards — into our ranking algorithm to determine which stories are generating the most buzz with minimal distortion.

People can use Thoora to discover the day’s biggest stories by browsing the top ranked stories in key categories (i.e. business, entertainment, sports, lifestyle…) or mine the hidden parts of the blogosphere and news with specific searches. We have indexed more than 100 million blogs and collect content from 720,000 blog posts and articles from 4,500 traditional media sources each day. Our content is Algorithmically Curated and our technology is completely automated, we don’t play favorites. For in the end, just as new voices appear on the web each day, technology will need to continually evolve in order to make sure these voices can be heard, without bias, and without making a too large demand on the reader’s time.

Shopping search engine Pic2shop *update*

January 13th, 2010 by Charles S. Knight
Posted in CEO Views, Shopping, Verticals | No Comments »

pic2shop_cart_icon61Pic2shopis the first barcode reader and comparison shopping app for the iPhone. It is free and available on the iTunes App Store worldwide (http://itunes.com/apps/pic2shop). Have you ever been in store, not knowing whether to buy immediately or do some more research at home? Just start pic2shop, scan the barcode with your iPhone’s camera, and get an instant price comparison with online retailers. Need more information? Go straight to the product page and read user reviews, check specifications, etc.

1
pasted-graphic-1

When our company, Vision Smarts, started working on pic2shop nearly a year ago, there was no barcode reader available for the iPhone. There were a few very successful apps for other platforms, notably ShopSavvy and CompareEverywhere for Android. But on the iPhone, most people thought it was impossible to decode UPCs or EANs, because the iPhone 3G did not have an autofocus camera. Images of the barcodes were hopelessly blurry:

blurry_barcode

sharp_barcode

Top: blurry barcode captured by the iPhone 3G
Bottom: barcode captured by an autofocus camera, like the newer iPhone 3GS

Thanks to our expertise in image processing, it seemed like a interesting challenge, and we set out to write the signal processing algorithms that eventually led to the release of the first version of pic2shop in April. It was far from perfect, and not everybody found it easy to use, but it worked! About a month later, we were surprised to see another startup come out with a similar product. Since then, we have been engaged in a competition to make barcode scanning faster and more accurate, and to give our users better search results. That other startup is the maker of the wildly successful RedLaser app, now #1 paid app on the App Store.

While RedLaser is a paid app, pic2shop is free, and we are committed to keeping it free. It seems only fair, since some retailers give us commissions on sales (we display all the results we find, regardless of whether we get a commission or not), and pic2shop is ad-supported. The new version, due in a few days, will feature a fully redesigned user interface and faster scanning. We are also continuously adding more retailers to the pic2shop results, and introducing more convenience functions like emailing the list of scanned items, or searching other sites.

There is now a flurry of barcode apps for the iPhone. Some work only on the newer iPhone 3GS, because it has an autofocus camera. Most of the other apps license the technology from the same two sources. The pic2shop barcode scanner is available as an affordable and easy-to-integrate SDK. Please visit our web site (http://www.visionsmarts.com/products/products.html) for all the details.

Just weeks after pic2shop came out, various companies have started contacting us to include barcode scanning in their own apps. We cannot disclose the specifics, but we are well positioned to predict that the most creative uses of mobile barcode scanning are yet to come!

Concerning pic2shop and shopping apps in general, it is safe to say that they will become mainstream very quickly. Not just for price comparisons, but also for coupons, product traceability, recalls, demonstration videos, etc. Apple (with the iPhone) and Google (with Android) have shown people that smartphones are cool and useful, not just another way to be tethered to the office. Thanks to them, mobile computing is the most exciting domain for developers since the Internet boom.

Benoit Maison
Founder,
Vision Smarts SPRL
Belgium

Some useful links:

pic2shop on the iTunes App Store

Vision Smarts’ Barcode reader SDK

Follow pic2shop on Twittertand facebookf

This just in:

Vision Smarts is proud to announce the pic2shop Web Integration Kit, a companion to their free barcode scanning App for iPhone devices. Developed specifically for online retailers, shopping portals and web developers alike, the pic2shop Web Integration Kit offers the ability for their customers to lookup a product by simply scanning a barcode from an iPhone.

UPCs and EANs are the traditional barcodes found on virtually all retail products, from books to cereal boxes. With pic2shop’s unique combination of image processing algorithms and machine learning techniques, looking up a product is as easy as aiming the camera at a barcode. Online retailers and price comparison sites can now offer their customers the means to instantly access product information, or buy in their store, all in real time. The whole process takes only a few seconds.

Shopping portals can capitalize on the mobile shopper interest for barcode scanning at no cost. By adding a feature that is highly valued by users, retailers who have already invested in building a mobile-friendly web site will increase their visibility and return on investment. All that is necessary is inserting a few lines of code into their existing web application to display a “Scan Barcode” button for iPhone-bearing visitors (and Android users too).

The pic2shop Web Integration Kit is perfect for:
* Online retailers
* Price comparison sites
* Product reviews (books, dvds, games, consumer electronics, etc)
* Product information (e.g. nutritional, environmental)
* Gift lists and shopping-oriented social media

When the customer presses the “Scan Barcode” button on the web page, their iPhone launches the pic2shop barcode scanner app.
If pic2shop is not installed, a message is briefly displayed, then the user is automatically taken to the iTunes App Store where they can install pic2shop for free.

If pic2shop is installed, it goes immediately in scanning mode. By aiming the camera at the barcode for one second or two, the UPC or EAN is read (even on older iPhone models). As soon as the code is recognized, pic2shop opens the Safari web browser to search the web application with the barcode digits. The entire processs only takes a few seconds, and is much faster than typing 12 digits or a product name using the on-screen keyboard. And considerably more fun.

Among the API’s most prolific features is its zero cost to implementation, ability to help retailers enhance the shopping experience, and generate buzz about their products. By taking advantage of the iPhone’s custom URL scheme, any app or web page can launch pic2shop, have the user scan a barcode, and then pass the barcode back to the calling web page or app. Vision Smarts has made a web page specific for developers available on their website. For application developers, there is a complete Xcode project that shows how to implement this mechanism in a just few lines of code and an online demo for web developers.

pic2shop Feature Highlights:
* Comfortable and simple User Interface
* Takes only a few seconds
* Compare prices and decide whether to purchase in-store, or to buy online, all in real time
* Allows other iPhone apps and web apps to use pic2shop as a free barcode scanner

“Shopping by barcode is one of those things that make the mobile internet different, and so exciting,” said Benoit Maison, founder of Vision Smarts. “You just can’t do that at your desk. With this new feature of pic2shop, any shopping web site can provide additional value to their users, at no cost. Our objective is to make pic2shop the standard iPhone barcode reader”

Wisdom Gained!

December 19th, 2009 by Guest Author
Posted in CEO Views, Guest Authors | 1 Comment »

2009-12-18_1657Trackle is a new service that tracks all of your personalized information on the Web, all in one place. Offering the industry’s most comprehensive index of popularly tracked categories, Trackle keeps tabs on everything in your life, from local, to social to shopping. The service is built with advanced algorithms that provide a layer of intelligence to the tracking process – ensuring timely and relevant results. Users get automatically notified over the Web, email or SMS whenever Trackle finds a match. Trackle is built on top of a robust tracking platform, designed to offer a wide range of tracking services. The service is completely free, secure and privacy compliant.

Pavan Nigam: CEO and Founder

It was about ten months ago when Trackle was launched and I have learnt a lot since then – from the usage of Trackle by our users, and by broader trends in the consumer websphere.

Let me first talk about Trackle.  I have been very impressed by the pace of user adoption.  Our subscriber community has been doubling every couple of months.  The most popular items being tracked remain crime, shopping deals, people (especially their own names), jobs, travel, stocks, sports and real estate.  An average user tracks about ten items which generate about twenty alerts per day.  Most users seem to be happy with getting these alerts via their daily digest – and many of them look forward to it.  On many occasions, when there were delays in our email server, we have received urgent emails from subscribers inquiring about their digest status – a few even called!  Of course, we would rather have subscribers go to trackle.com more often so that they can discover new tracklets and new features that we add regularly.

The most popular tracklets for SMS alerts are sports scores, breaking news, and select social networking events (eg. Facebook posts).  Now comes the surprising part.  I am amazed at how many professionals are using this service.  Marketing and Sales folks are using it for tracking companies – competitors, prospects and customers.  Recruiters are using it to track people.  I know of even a detective agency using it for tracking a variety of items including classified and media site postings.

The amount of information on the web has become so overwhelming and so multi dimensional (news, blogs, media sites, social networks and more), that it can be incredibly time consuming to stay on top of this varied information – and, that assumes that you have the time to do that in the first place.  Our product team is devising new features to simplify some of these tasks.

In the websphere, the most astonishing trend has been Twitter.  For a company that most of us had not heard about a year ago, it is amazing to see its pervasiveness.  It has become synonymous with real time search – yet another term that was not in the mainstream lexicon a year ago.  Now, everyone is tweeting – from Larry King to Paris Hilton to Kobe Bryant.  And, so many (so called) real time search companies are trying to mine these mounds of data to get to the few hidden gems of information.

Of course, the fundamental flaw is that the gems are far and few between since the bulk of the tweets are mundane and frivolous.  For now, Twitter appears to be the domain of Marketing and PR folks who are running all kinds of experiments to gain experience in extracting value from these billions of tweets.

The other major trend is the promotion of APIs by social networks such as Facebook and LinkedIn.  Companies are exploring ways of integrating these services in their own offerings along with extracting value from the immense community generated information that became accessible through these APIs.

I am convinced that there is a tremendous value proposition in mining an aggregation of all this data – whether it comes from tweets, Facebook posts, blogs, news sources, media sites and more.  And, I am sure that some startups somewhere in Silicon Valley are already working on that!

Source: Trackle CEO blog here: