All, I have a need to loop through some .html files in a site collection and parse out some matching text. Was wondering the best way/best practice for this, given the site collection has about 15,000 files I need to loop through. I was trying the below without success. Using SharePoint 2013 and 2016 on-premise currently.
In the .html files I'm looking to extract from the url on the line which as an example is as such:
<meta http-equiv="refresh" content="0;url http://www.kmo.name/uploads/docs/tasks/1003333/01_MAP.pdf
I tried variations of this Powershell without success:
function Get-DocInventory([string]$siteUrl) {
$site = New-Object Microsoft.SharePoint.SPSite $siteUrl
$web = Get-SPWeb -Identity "https://kmo.us.name.com/Sites/orders/"
foreach ($list in $web.Lists) {
if ($list.BaseType -ne “DocumentLibrary”) {
continue
}
foreach ($item in $list.Items) {
$url = select-string -Pattern 'url=http://www.kmo.name/uploads/' -AllMatches | % { $_.Matches } | % { $_.Value }
$data = @{
"Item URL" = $item.Url
"Item Name" = $list.Name
"HTML File" = $url
}
New-Object PSObject -Property $data
}
}
$web.Dispose();
$site.Dispose();
}
Get-DocInventory "https://kmo.us.name.com/Sites/orders/" | Export-Csv -NoTypeInformation -Path "C:\Powershell\OrderDetail_Parse.csv"