Quantcast
Channel: SharePoint 2013 - Setup, Upgrade, Administration and Operations forum
Viewing all articles
Browse latest Browse all 21070

Extract Meta Information from .HTML Files for Site Collection

$
0
0

All, I have a need to loop through some .html files in a site collection and parse out some matching text. Was wondering the best way/best practice for this, given the site collection has about 15,000 files I need to loop through. I was trying the below without success. Using SharePoint 2013 and 2016 on-premise currently.

In the .html files I'm looking to extract from the url on the line which as an example is as such:

<meta http-equiv="refresh" content="0;url http://www.kmo.name/uploads/docs/tasks/1003333/01_MAP.pdf

I tried variations of this Powershell without success:

function Get-DocInventory([string]$siteUrl) {
$site = New-Object Microsoft.SharePoint.SPSite $siteUrl

$web = Get-SPWeb -Identity "https://kmo.us.name.com/Sites/orders/"

foreach ($list in $web.Lists) {
if ($list.BaseType -ne “DocumentLibrary”) {
continue
}

foreach ($item in $list.Items) {
$url = select-string -Pattern 'url=http://www.kmo.name/uploads/' -AllMatches | % { $_.Matches } | % { $_.Value }
$data = @{
 "Item URL" = $item.Url
 "Item Name" = $list.Name
 "HTML File" = $url
}
New-Object PSObject -Property $data
}


}
$web.Dispose();

$site.Dispose();
}

Get-DocInventory "https://kmo.us.name.com/Sites/orders/" | Export-Csv -NoTypeInformation -Path "C:\Powershell\OrderDetail_Parse.csv"


Viewing all articles
Browse latest Browse all 21070

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>