March, 2010


31
Mar 10

What’s all this fuss about Real Time Search?

Using a search engine is the fundamental starting point for discovering the worlds information. Real time search makes this information available quicker than ever before. Google and Bing now bring you the information you are looking for, seconds after it becomes available on the internet. This amazing new innovation is made possible by a simple new concept called PubSubHubbub.

What are you talking about?

The basic concept of a search engine pre 2009 was simple enough: Have a bunch of web page addresses stored in a database and crawl them as fast as you can to update the information. As simple as this sounds, this is an extremely inefficient method of retrieving information. This method of aggregating information is known as the “pull method”. However, the guys over at Google designed a new concept called PubSubHubbub which surprisingly enough, wasn’t thought of years before. Instead of search engines constantly pinging your website looking for new information, why not rather jut let them know when new information is available? This is known as the “push method” and is a far more effecient method of sharing and aggregating information on the web.

What content is being indexed in real time?

Google has managed to aggregate all sorts of content in real time. Some real time content I have noticed on Google originated from these types of websites:

  • Blogs.
  • Forums.
  • Online News Sources.
  • Social Media (Twitter, Facebook, FriendFeed etc).

How does this affect my website(s)?

From what I already know is that WordPress registered blogs (yourblogname.wordpress.com) have automatically been updated to include this new feature. Therefore, if you host your blog on WordPress chances are, you’re content is already being indexed in real time. However, if you run a standard website, integrating this new technology is slightly more complicated. Due to the fact that PubSubHubbub is still relatively new to the internet, there aren’t many tutorials available to the average Joe. Consider contacting an experienced web developer.

However, it is still unclear as whether or not Google just makes use of PubSubHubbub to gain all this real time information. This was mentioned on the official Google blog:

Our real-time search features are based on more than a dozen new search technologies that enable us to monitor more than a billion documents and process hundreds of millions of real-time changes each day. Of course, none of this would be possible without the support of our new partners that we’re announcing today: Facebook, MySpace, FriendFeed, Jaiku and Identi.ca — along with Twitter, which we announced a few weeks ago.

Real time search in action
If you would like to see real time indexing in action, go to Google. Insert a search phrase and hit enter. Once the results page has loaded, click on the “Show Options” link above the results section. A menu will appear on the left of the screen. Click on the “Latest” link and Google will show you all the latest content is has fetched for that particular search phrase. The page will automatically update with new content as Google indexes it. Very cool!

Real Time Indexing

As you can notice, the world is becoming a far smaller place than we once thought. Information is now literally at your fingertips. Obviously people will have different views on how relevant these results are and I’m sure Google is having a tough time keeping their algorithm up to date. This is a very exciting time for the internet as a whole. One of the things I like about real time content is that it fuels the ongoing advances of augmented reality.


30
Mar 10

Watch the Large Hadron Collider Experiments LIVE at 8:30am this morning

cernThe Large Hadron Collider (LHC) in Geneva, Switzerland will perform the first-ever proton collision this morning at 8:30am our time. This is truly a remarkable day for science and the best thing is, you can watch it live on the CERN website. Five webcasts will be available today and the main webcast will include live footage from the control rooms for the LHC accelerator and all four LHC experiments and coverage of the press conference to announce the first collisions. Enough to excite the “inner-geek” in you!

There’s been rumours that these experiements will create black holes and destroy the earth, all of which are ridiculous of course! Enjoy the show…!


17
Mar 10

myScoop slowly climbing the ranks

It would appear that myScoop‘s popularity is steadily growing (atleast according to Alexa). I thought I would take the time to share the below graph with myScoop’s supporters. Thanks guys!

myScoop vs Afrigator vs Amatomu

myScoop vs Afrigator vs Amatomu


14
Mar 10

myScoop has a new home!

myScoop has moved over to its new host with only a few glitches here and there. The near-seamless transition took place during the hours of 8pm on Saturday 13 March to around 6am on the 14 March. This new host has tons more power than the previous and allows for a larger amount of bandwidth usage which makes way for more innovative features that you will be seeing soon. The host is also running off a more secure backend which in turn means more security for you.

The reasons
As any South Africa web entrepreneur would know, the cost of bandwidth in South Africa is hellishly expensive. One of the main reasons for moving the server to England was due to this problem. Keep in mind though that with access to more bandwidth and processing speeds offered by the new host, I can develop many more features that would be of interest to you in the management and reporting of your blogs and blog statistics.

The “glitches”
I need to apologise to everyone as during 4am this morning whilst attempting to transfer a 1gig database, I accidentally deleted ALL visitor statistics for the 12 March 2010. It was a silly error on my side and I apologise for the loss of this data. There will also be some data loss for today as whilst the propogation was taking place throughout the internet, some stats were being recorded on the old server and some on the new. The reason for leaving the old server up in these times is due to the fact that I did not want to jeopardise the loading speed of your blog’s webpages by it not being able to find the source for the traffic badge.

The “new home”
For those of you that’s interested, here is some information regarding the new server:
Bandwith Allocation: 200gb
Gaurenteed RAM: 128mb
Burst RAM: 256mb
CPU: 4 x Xeon
OS: CentOS
Location: England

The feedback
With a big move like this, and the new server being a completely different operating system, bugs are inevitable. If you find any bugs or experience a problem on the site, please get in touch with me at nickduncan.sa@gmail.com

Once again, thank you for your patience and understanding while myScoop made its move to its new home. Until next time, happy scooping!


13
Mar 10

Extracting a domain name with PHP and Regular Expressions

As most of your know, regular expressions can be a nightmare if you don’t know much about the subject. Here is a quick tutorial on how you can extract the domain name from any URL using regular expressions and PHP. This includes http:// and https:// domain names.

 
<?php 	
$link1 = "http://nickduncan.co.za/";
$link2 = "http://nickduncan.co.za";
$link3 = "http://www.nickduncan.co.za/";
$link4 = "http://www.nickduncan.co.za";
 
$link5 = "https://nickduncan.co.za/";
$link6 = "https://nickduncan.co.za";
$link7 = "https://www.nickduncan.co.za/";
$link8 = "https://www.nickduncan.co.za";
 
$link9 = "http://www.nickduncan.co.za/index.php";
$link10 = "http://www.nickduncan.co.za/index.php?id=34&that=this";
$link11 = "http://nickduncan.co.za/php-header-include-%E2%80%93-saving-development-time/";
 
echo return_domain($link1)." - ".$link1;
echo "<br />".return_domain($link2)." - ".$link2;
echo "<br />".return_domain($link3)." - ".$link3;
echo "<br />".return_domain($link4)." - ".$link4;
echo "<br />".return_domain($link5)." - ".$link5;
echo "<br />".return_domain($link6)." - ".$link6;
echo "<br />".return_domain($link7)." - ".$link7;
echo "<br />".return_domain($link8)." - ".$link8;
echo "<br />".return_domain($link9)." - ".$link9;
echo "<br />".return_domain($link10)." - ".$link10;
echo "<br />".return_domain($link11)." - ".$link11;
 
 
function return_domain($link) {
    $domain = preg_match('@^(?:https?://)?([^/]+)@i', $link, $matches);
    return $matches[1];
}
?>

Another easy method is to use the PHP function called parse_url which returns certain elements of a URI that you choose. For examples on this function, consult the PHP manual which has more than enough examples.