Category Archives: php

C# MD5 File Hash and PHP Comparison

Further to my recent formless multi-threaded C# application blog post I’ve just implemented a simple MD5 hashing system to ensure that files are uploaded without the contents being changed. I’ve created a new method in my File Uploader class to calculate the MD5 hash of the files I’m uploading. You wouldn’t want to do this on the fly for larger files because of the time taken but the XML files I’m dealing with are less than 2kb in size so it’s not an issue. Here’s the MD5 method for my uploader class.

        private string md5Hash
        {
            get
            {
                using (var md5 = MD5.Create())
                {
                    using (var stream = File.OpenRead(WebExtensionsService.XMLFolder + "\\" + this.Filename))
                    {
                        byte[] md5_bytes=md5.ComputeHash(stream);
                        return BitConverter.ToString(md5_bytes).Replace("-","");
                    }
                }
            }
        }

Note that you need to convert the byte stream to a string and you’re going to have to strip out all the “-” characters to make the MD5 string look normal.

Once I had the MD5 hash I simply appended it to the URL I’m uploading the file to. Like this:

    byte[] byte_response = wClient.UploadFile(WebExtensionsService.UploadLocation+"?md5="+md5, "POST", WebExtensionsService.XMLFolder + "\\" + this.Filename);

The final step is to calculate the MD5 of the uploaded file in PHP and then compare this with the passed MD5 hash. I do this in the code below as well as a couple of other checks to make sure the file is of the correct type and isn’t too big.

	if (isset($_GET['md5']))
	{
		$md5=$_GET['md5'];
	}
	else
	{
		echo "Error : No md5 hash";
		die();		
	}

	if (substr($_FILES["file"]["name"],strlen($_FILES["file"]["name"])-4)=="xml")
	{
		echo "Error : Invalid file type";
		die();
	}
        if ($_FILES["file"]["size"]/1024>1000)
	{
		echo "Error : File too large";
		die();		
	}
	
	$tmp_name=$_FILES["file"]["tmp_name"];
	
	$calculated_md5=md5_file($tmp_name);
	
	if (strtoupper($md5)!=strtoupper($calculated_md5))
	{
		echo "Error : MD5 File Hash Mismatch";
		die();
	}

Caching Your Website Content

I’ve always tried to include some varying content on my websites because many people believe it helps your search engine rankings. The logic being that fresh content is likely to be more relevant and get a boost in the SERPS. I don’t know if it’s true or not because I can’t find anything definitive posted by anyone from Google or a similar major search engine. In any event, it seems like a good idea and I’ve been including a small amount of changing content on my website for years. Thinks like the last 5 blog entries or customer testimonials mainly.

That sort of content is database driven so rather than hit your database every time there’s a pageview you should consider creating the content on a regular basis and having your website display the cached information. I do this with PHP and CRON jobs. My PHP script generates the content and writes it to a file. A bit of PHP in the web template includes that file to display the content. The CRON job runs the PHP that generates the content, perhaps hourly, but more commonly, daily.

Here’s what my CRON jobs generally look like:

14 */8 * * * php /srv/www/public_html/cron-scripts/create-blog-links.php

And a skeleton PHP script to generate some content looks something like what I’ve shown below. Note that I echo out the data created because (generally) when your cron jobs are run you’ll get an email from your server displaying the output from the script.

	$now=microtime(true);

        //code to generate content goes here

	$file_name="/path/to/webroot/generated-includes/blog-links.php";
	
	$content=get_content();
	
	if (strlen($content)>0)
	{
		$file_handle=fopen($file_name,'w') or die('cannot open file');

		fwrite($file_handle,$content);
		fclose($file_handle);
	}
	
	echo "create-blog-links.php complete, run time:".number_format(microtime(true)-$now,4)." seconds<br /><br />";
	echo "$content<br />";
	
	
?>

And finally, the include for my web templates that actually show the generated content. Again I do this in PHP.

include('/usr/www/generated-includes/latest-blog-links.php');

I believe this process could be taken one step further (and it’s something I plan on experimenting with) by actually rotating the parts of the static content of a website. I think I’d do this less frequently, perhaps weekly or monthly and you’d want to make sure you have a large pool of static content to rotate in and out.

PHP and Dates – Tearing My Hair Out

I had an interesting problem this morning with some dates in PHP. When I use dates in PHP I generally try to handle them as a unix timestamp using the local unix time of the server. I then modify this based on clients timezone to show them their local time. This has worked fine up until now.

November the 4th 2012 was the day that daylight savings rolled back in the USA and this caused a problem when reports for one of my products were run across this transition day. The trouble was when you got a local server time and converted it to a client time. For example using something like this:

$date=mktime(8,0,0,11,4,2012); //set to 8 AM on November 4
$date_time=new DateTime("now",new DateTimeZone('America/Los_Angeles'));
$date_time->setTimestamp($date);
echo "date_time:: ".$date_time->format("D j m y h:i:s");

Resulted in a different time to:

$date=mktime(8,0,0,11,5,2012); //set to 8 AM on November 5
$date_time=new DateTime("now",new DateTimeZone('America/Los_Angeles'));
$date_time->setTimestamp($date);
echo "date_time:: ".$date_time->format("D j m y h:i:s");

Which normally isn’t a problem except when you’re cycling through a date range one day at a time by adding (24*60*60) seconds to each day and expecting the client time to be indexed one day at a time too. But, it’s not, on one of the days there’s going to be an extra hour (or an hour) less. So, the solution (and it’s horrid) is to convert server to time to local client time and add a day to that, and then convert it back to server time. Yuck. But it works, and it works something like this:

	//
	//convert to local time
	//		
	$start_date=mktime(8,0,0,11,1,2012);
	$date_time=new DateTime_52("now",new DateTimeZone('America/Los_Angeles'));
	$date_time->setTimestamp($start_date);
			
	for ($i=0;$i<7;$i++)	
	{
		$date_time->setDate($date_time->format("Y"),$date_time->format("m"),$date_time->format("j")+$i);	
		echo "day ".$i." local time ".$date_time->format("D j m y H:i:s")."<br />";
		echo "day ".$i." server time ".date("D j m y h:i:s",$date_time->getTimestamp())."<br />";
	  $first_day_of_week1=get_first_day_of_week($date_time->getTimestamp(),new DateTimeZone('America/Los_Angeles'));
		echo "day ".$i." first_day_of_week:: ".date("D j m y h:i:s",$first_day_of_week1)."<br />";
		
	}

Redirecting Unwanted Domains Pointing at Your Content

In the last year I’ve had people pointing unwanted domain names at my own website content. For example, let’s say I have a web site called http://www.foo.com that uses the name server reallycool.nameserver.com. If I wanted to be a pain in the butt I could point another domain (say http://www.annoyingwebsite.com) at the same content by using a custom DNS A record or a 301 redirect. It’s a pretty simple matter to work out the nameservers and IP of a site by using a tool like this.

The problem with someone doing this to your website is that search engines (like Google) see this second website as a duplicate of your own website. Now, in theory this shouldn’t be a problem because Google should determine that your website was the first listed and pretty much ignore the duplicate site. In theory anyway, but the paranoid part of me says having a copy of your website is a Bad ThingTM. Another problem is that this spurious second site will show up in search listings and also in your website referrer logs. It’s an annoying, and potentially damaging issue.

I tried a few different things to stop this from happening including messing about with .htaccess files but ended up just adding the following to the top of my global header file (which happens to be php).

	//
	//Redirect people hijacking site
	//
  if ($_SERVER['HTTP_HOST']!='www.foo.com' &&  $_SERVER['HTTP_HOST']!='foo.com' && $_SERVER['HTTP_HOST']!='localhost')
  {
   Header( "HTTP/1.1 301 Moved Permanently" ); 
   Header( "Location: http://www.google.com" ); 
  }

Note that I’ve got the localhost entry in there to allow for debugging of my websites on my local PC.