Tuesday, September 3, 2013

Working with CSV and PowerShell for Link Checkers

One of the current tools in my toolbox is Xenu Link Sleuth which does a great job of scanning through our site and finding links that are broken or need updating.  Although while the report it gives is good to review afterwards it is difficult to send out to others, so to make the data useful to my business users I needed to make something that was more readable.  Since PowerShell is something that we use often it seemed an easy choice to write something that reads through the exported data and generates a useful report.

# Crawl Report
$crawlReport = Import-Csv "" -Delimiter "`t"
$crawlList = @()
# Page Map
$siteMap = Import-Csv "" -Delimiter "`t"
$siteList = @()
# Links List
[string] $brokenLinksList = "BrokenLinksList.txt"
[string] $brokenLinksOverview = "BrokenLinksList.csv"
# Clean up from last time
if (Test-Path $brokenLinksList) {
  Remove-Item $brokenLinksList -Force
}
if (Test-Path $brokenLinksOverview) {
  Remove-Item $brokenLinksOverview -Force
}
 
"Just some initial information on what we are dealing with."
"Crawl Report is " + $crawlReport.Length + " lines."
"Site Map is " + $siteMap.Length + " lines."
# Check each Address that has a Status-Code of 404, add it to an array we will
# use to get its origin from the Site Map
foreach ($row in $crawlReport) {
  if ($row.StatusCode -eq "404") {
    $crawlList += $row.Address
  }
}
"We now have " + $crawlList.Count + " broken links to deal with."
# Check in the Site Map for the Address as a Link to Page
foreach ($link in $crawlList) {
  foreach ($line in $siteMap) {
    if ($link -eq $line.LinkToPage) {
      # Pull these together to make it simpler to review later on
      $data = $link + "," + $line.LinkToPage + "," + $line.OriginPage
      $siteList += $data
    }
  }
}
"Overall there are " + $siteList.Count + " broken links to fix."
# Output the Address, Link To Page and Origin Page to a separate file
# since these were put together as CSV's earlier let's just make an
# exported CSV file for ease of use later on
"BrokenLink,LinkToPage,OriginPage" >> $brokenLinksList
foreach ($entry in $siteList) {
  $entry >> $brokenLinksList
}
# Now that we have everything in one file let's make it a CSV
$brokenLinksList | Export-Csv $brokenLinksOverview

This puts the data into a CSV file that can be filtered and sorted in Excel, which puts the data in a format that's more useful for my business users.

No comments: