• Powershell – Split a Text File – Output With Delimiter As File Name

    Home » Forums » Developers, developers, developers » DevOps Lounge » Powershell – Split a Text File – Output With Delimiter As File Name

    Author
    Topic
    #504745

    Folks,

    good day to all and every one.

    Powershell newbie, as of recent.

    I wanted to split a text file into smaller ones.

    The file names of each text file are between placeholders

    XXX File Name 1 ZZZ

    some content here

    XXX Filename2 ZZZ

    some content here

    I have been grappling with this below:confused:

    http://superuser.com/questions/466363/how-to-split-a-text-file-into-multiple-text-files

    Code:
    # Split a Text File  - By A Delimiter
    
    
    $Path = "C:UsersPBLDesktopb"         # Folder Containing the Text file  - And where Files will be Split
    
    
    $InputFile = (Join-Path $Path "b.txt")
    
    
    $Reader = New-Object System.IO.StreamReader($InputFile)
    
    While (($Line = $Reader.ReadLine()) -ne $null) {
        If ($Line -match " [regex]'(XXX)*(ZZZ)' {                          # This is wrong
            $OutputFile = $matches[1] + ".txt"                               # This is wrong
        }
    
        Add-Content (Join-Path $Path $OutputFile) $Line
    }
    
    
    

    To summarize

    The filename of each text file is between

    XXX filename ZZZ – couldn’t think of anything more imaginative:rolleyes:

    I want to split 1 large text file every time it finds a delimiter which is the XXX filename ZZZ

    I hope I’m making more sense than last time – Ive been all over and just can’t find anything to fix it

    thanks folks and RG,

    really appreciative of all the help:)

    pb

    Viewing 12 reply threads
    Author
    Replies
    • #1554639

      The issue is your match. $matches[1] = XXX and you want to match the file name. The brackets are a grouping modifier and the use of $matches[x] returns a specific group. Try this line:
      If ($Line -match “XXX(.*)ZZZ”) {

      cheers, Paul

    • #1554691

      Hello Paul,

      thank you for your help

      Does below look right:confused:

      Code:
      While (($Line = $Reader.ReadLine()) -ne $null) {
          
          If ($Line -match "XXX(.*)ZZZ") {
      
                                     
              $OutputFile = $matches[1] = XXX  + ".txt"
          }
      
          Add-Content (Join-Path $Path $OutputFile) $Line
      }
      
      

      I ran it but no split:(

      I think we are nearly there – it looks a lot more logical

      cheers:)

      pb

    • #1554771

      PB,

      A slightly different approach…

      Code:
      $FNPattern = [RegEx] '(xxx)*(zzz)'
      
      $Path = "G:Test"         # Folder Containing the Text file  - And where Files will be Split
      
      $SourceData = Get-Content -Path "$($Path)SplitTest.txt"
      
      ForEach ($Line in $SourceData) {
      
        
        If ($Line -match $FNPattern) {
          $Part = $line.split("xxx")
          $FN = $Part[3].split("zzz")
          $CurrentFN = $Path + "$($FN[0].trim())" + ".txt"
        }
        Else {
        Add-Content -Path "$CurrentFN" -Value $Line
        }
      }
      

      Source File: SplitTest.txt

      Code:
      xxx FirstFile zzz
      FirstFile line 1
      FirstFile line 2
      FirstFile line 3
      FirstFile line 4
      FirstFile line 5
      FirstFile line 6
      xxx SecondFile zzz
      SecondFile line A
      SecondFile line B
      SecondFile line C
      SecondFile line D
      

      Results: FirstFile.txt

      Code:
      FirstFile line 1
      FirstFile line 2
      FirstFile line 3
      FirstFile line 4
      FirstFile line 5
      FirstFile line 6
      

      Results:SecondFile.txt

      Code:
      SecondFile line A
      SecondFile line B
      SecondFile line C
      SecondFile line D
      

      HTH :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    • #1554773

      RG,

      thank you it split the file nicely. 🙂

      How Do I make the name of the file with the match

      XXX File Name 1 ZZZ >> File Name 1.txt
      XXX File Name 2 ZZZ >> File Name 2.txt

      etc

      $CurrentFN = $Path + $FNPattern + “$($FN[0].trim())”

      do I add the pattern match to it?

      thanks RG

      pb

    • #1554780

      PB,

      I don’t understand the question as the code as currently written will handle any legal file name, including ones w/spaces, and add the .txt extension.

      HTH :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    • #1554795

      Hi RG,

      awww, i hope I’m not muddying things up again. Been a long day.:(

      When you have a spare moment, would you be kind enough to see if it splits this file

      43781-File

      I tried and for some reason it did not split it.

      it names it with the file name as demonstrated in your post – so please ignore previous post – got powershell mania today

      cheers:)

      pb

    • #1554801

      PB,

      It works fine if you change the code items from xxx zzz to XXX ZZZ. I’m working on making it work w/either upper or lower case. HTH :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    • #1554803

      PB,

      Ok here’s a revision that works in all cases of file names & delimiters.

      Code:
      $FNPattern = [RegEx] '([xX]{3})*([zZ]{3})'
      
      $Path = "G:Test"  # Folder Containing the Text file 
                          # And where result Files will be placed
      
      $SourceData = Get-Content -Path "$($Path)SplitTest.txt"
      
      ForEach ($Line in $SourceData) {
      
        If ($Line -match $FNPattern) {
          $FN = $Line.Substring(3,($Line.Length-6))
          $CurrentFN = $Path + "$($FN.trim())" + ".txt"
        }
        Else {
        Add-Content -Path "$CurrentFN" -Value $Line
        }
      
      }  #End ForEach

      Test file:

      Code:
      [COLOR="#0000FF"]XXX  File Name 1 ZZZ[/COLOR]
      
      Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae.
      
      
      [COLOR="#0000FF"]XXX File Name 2  ZzZ[/COLOR]
      
      Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? 
      
      [COLOR="#0000FF"]XXx FileName 3  ZZZ[/COLOR]
      Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? 
      

      HTH :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    • #1554809

      RG,

      you are too awesome and too kind.

      I have tested it – worked like a charm -:clapping:

      Its all your fault – you have been showing how power shell is pretty awesome – and now I have got the bug.

      VBA is good – but it gets messy having to convert text files to docx – to split – or mail merge split….

      Some times you just want to SPLIT A TEXT FILE – no thrills attached

      Thanks for the awesome – generous help :D:

      Really indebted

      Have a great weekend now:cheers:
      cheers

      pb

    • #1554810

      PB,

      Thank you for the kind words.

      I was looking at the code again and realized I didn’t allow for the case where there could be spaces before/after the delimiters. So here’s a minor change to fix that.

      Code:
      $FNPattern = [RegEx] '([xX]{3})*([zZ]{3})'
      
      $Path = "G:Test"  # Folder Containing the Text file 
                          # And where result Files will be placed
      
      $SourceData = Get-Content -Path "$($Path)SplitTest.txt"
      
      ForEach ($Line in $SourceData) {
      
        If ($Line -match $FNPattern) {  
          $FN = $Line.Trim() 
          $FN = $FN.Substring(3,($FN.Length-6))
          $CurrentFN = $Path + "$($FN.trim())" + ".txt"
        }
        Else {
        Add-Content -Path "$CurrentFN" -Value $Line
        }
      
      }  #End ForEach
      

      HTH :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    • #1554814

      PB,

      Here’s an improved version that simplifies the code for determining the file name and also adds two switches. One is present will strip out blank lines from the resulting files. The other will erase any files in the target directory that already exist, otherwise the new data would be appended.

      Code:
      Param (
      
        [Parameter(Mandatory=$false)]
          [Switch] $OldFileDelete,
        [Parameter(Mandatory=$false)]
          [Switch] $StripBlankLines
      )
      
      $FNPattern = [RegEx] '([xX]{3})*([zZ]{3})'
      
      $Path = "G:Test"  # Folder Containing the Text file 
                          # And where result Files will be placed
      
      $SourceData = Get-Content -Path "$($Path)SplitTest.txt"
      
      ForEach ($Line in $SourceData) {
      
        If ($Line -match $FNPattern) {  
          $FN = $Line.Trim() | ForEach-Object {$_.Substring(3,($_.Length-6))}  
          $CurrentFN = $Path + "$($FN.trim())" + ".txt" 
      
          If ($OldFileDelete.IsPresent -and (Test-Path -Path "$CurrentFN")) {
            Remove-Item -Path "$CurrentFN"
          }
        }
        Else {
          If ($StripBlankLines.IsPresent -and ($Line.Trim().Length -eq 0)) {
          }
          Else {
            Add-Content -Path "$CurrentFN" -Value $Line
          }
        }
      
      }  #End ForEach
      

      Of course you could also add a parameter for the directory and one for the source file also. The possibilities are endless.

      I hope this helps in your quest for PowerShell Mastery.

      HTH :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    • #1554868

      RG,

      what a treat thank you so much

      Have a great week end

      pb:D

    • #2394614

      RG,

      Is it possible to modify this for very large text files (~3-5 gigs) using something like streamreader instead of Get -Content?

       

      Thanks,

      WGS

    Viewing 12 reply threads
    Reply To: Powershell – Split a Text File – Output With Delimiter As File Name

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: