• Want command-line utility to search single file for multiple strings

    Home » Forums » AskWoody support » Productivity software by function » Productivity software by function – other » Want command-line utility to search single file for multiple strings

    Author
    Topic
    #488590

    I have a single log-type ASCII text file whose lines come in a significant number of possible line formats (perhaps 50?).
    I want to test each line to see if it contains one of a number of unique strings, and write the line to an output file if any of these required strings is found.
    If the line contains none of the required strings, or contains one of a number of NON-required strings, the line is to be ignored.

    The strings would usually contain more than one word separated by blanks, and the line will usually contain at least one email address wrapped in angle brackets (e.g. )

    The built-in commands FIND and FINDSTR really only handle a single string, without errors, so up to 50 runs against the same input file would be somewhat inefficient!

    Does anyone know of a command-line utility which would do this? (If anyone has a UK IBM mainframe background, what I’m really asking for is a free version of the SELCOPY utility!)

    BATcher

    Plethora means a lot to me.

    Viewing 17 reply threads
    Author
    Replies
    • #1384619

      This would be a cinch in Perl using regular expressions! However a quick Google for ‘DOS Regular expressions’ comes up with some suggestions e.g.
      http://www.computerhope.com/findstr.htm
      http://www.2150.com/regexfilter/Documentation/regular_expressions.asp

      Eliminate spare time: start programming PowerShell

    • #1384620

      Probably wouldn’t be too hard to write a small C# app to do that, using regular expressions.

    • #1384653

      I don’t think regular expressions would assist greatly – it’s simply a multiple string-matching problem.
      if line contains “string 1” then write it to output-file
      if line contains “string 2” then write it to output-file

      if line contains “string z” then ignore line

      And it would be extremely hard to write any application other than BATch for me – the last computer language programming I did was in IBM Assembler-360 and Rexx. There ain’t much of that around no more.

      BATcher

      Plethora means a lot to me.

    • #1384655

      Well, even for whole string matching, using a regular expression can make things easy, although it seems that wouldn’t be needed here.

    • #1384656

      Couldn’t you use Powershell? See some of the links at powershell search text file for string for a starting point. In particuler, see Hey, Scripting Guy! How can I use Windows PowerShell to Search a Text File for multiple strings?.

      Joe

      --Joe

    • #1384665

      BATcher,

      Here’s a powershell program that will do the trick.

      Code:
      param (
        [string]$ExistingFile = "WindowsUpdate.log",
        [string]$NewFile = "PSProcessed.txt",
        [string]$DriveDirPath  = "G:BEKDocsScripts"
      )
      # ******* Setup Section *********
       $LinesProcessed = 0
       $LinesMatched = 0
       $MatchStrings = @("*Process:*", "*AUSearcher Search*")
       $MatchCnt = $MatchStrings.Count
       remove-item "$DriveDirPath$NewFile" -ErrorVariable Errs 2>$null
      # *******End of Setup Section ********
      
      ForEach( $Line in get-content "$DriveDirPath$ExistingFile" ) {
         $LinesProcessed++
         For( $Cnt = 0 ; $Cnt -lt $MatchCnt; $Cnt++) {
      
           if($Line -like $MatchStrings[$Cnt] ) {
             add-content "$DriveDirPath$NewFile" $Line
             $LinesMatched++
             break
           }  #End If
      
         }  #End For 
      
      }   #End ForEach
      
      Write-Host "$LinesProcessed lines were tested. `n"  `
                 "$LinesMatched lines were matched. `n"  `
                 $Errs.count " errors encountered."
      

      Notes:
      1. You need to change the $Matchstrings = @(…..) array. Just replace the ….. with a list of the phrases you want to search for and add an wildcard * on either side. See the example. I’ve only got 2 items in the array but you can place as many as you need.

      2. All the parameters will default (lines 2-4) when you call the program you can over ride the defaults:
      [noparse] .ProccessTextFile.ps1 -ExistingFile filespec -Newfile filespec -DriveDirPath d:pathpath…[/noparse]
      as written the source and output files have to be in the same directory of course this could be changed. You also only need to include those you want to change and they can be in any order.

      3. Type powershell in the search box to get a PowerShell prompt to get started.

      Here is the sample output from 2 successive runs:
      33573-PowerShellRun
      Note the second one shows an error. That’s because I deleted the destination file and when I tried to remove it it wasn’t there. If you get more that 1 error something’s rotten in Mudville! 😆

      HTH :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

      • #1384883

        Hi there, I had a need to do this to extract Argos (an eVisions reporting tool) actual report runtime events from a system logfile with loads of other stuff in it ages ago, and consequently it occurred to me that if I made the thing more versatile, it would save me from trying to resurrect my oft-rusting Perl/RegExp skills when I needed to do this in future with other text files.

        So, I wrote a small utility using AutoIT and dubbed it Textreme, and you can download it from my website to see if it might come some way towards meeting your needs.

        It’s available at http://www.jollybean.co.uk under the Textreme menu item on the right.

        If you DO give it a try I would be interested to know if it was of use to you!

        Best regards,

        Jim.

        [Edit] I just tried it and it appears that the Include function does not ‘cascade’ properly so that you would have to make several passes — sorry about that but thanks for highlighting a ‘feature’ that I need to work on!

        [Edit2] Okay, I think I have got both the Include and Exclude functions cascading correctly now. If you can wait until tomorrow, I should be able to upload the latest version to my site this evening (UK time). I had been meaning to do this anyway because I recently added another function (‘Move’) as a result of an article about editors written by Verity Stob on TheRegister website

    • #1385004

      Hi Batcher,

      At work I have had to do that a bunch. On the Linux side it’s easy with the command “grep”. In order to be portable, I have installed Cygwin on my WinXP laptop so it acts like Linux. Then using Perl I have written several different grep-like programs for various specific functions. For your purpose that’s likely overkill.

      How about a windows version of grep?
      see http://www.wingrep.com/index.htm

      I have never used it, but under “Features” it does claim to have a command-line interface.
      If nothing else it may give you another term to google for….

      Good Luck!
      brino

    • #1385086

      RetiredGeek – it looks as if my reply to you yesterday has disappeared! I have never used PowerShell (at least knowingly), but will see if I can get your program to work in my circumstance

      Jim – your mention of Verity Stob (one of my literary/literate heroines, like Lucy Kellaway) makes it compulsory for me to try your utility! (But tomorrow…)

      Brino – I wasn’t aware that grep handled such a large number of search strings, but I will look into it.

      BATcher

      Plethora means a lot to me.

    • #1385099

      BATcher,

      Another candidate from the Unix/Linux camp may be awk (or gawk). I believe it is also available in the Cygwin package that brino mentioned (http://cygwin.com/packages/). Another implementation is at http://gnuwin32.sourceforge.net/packages/gawk.htm.

      mo.eu

    • #1385134

      I have a single log-type ASCII text file whose lines come in a significant number of possible line formats (perhaps 50?).
      I want to test each line to see if it contains one of a number of unique strings, and write the line to an output file if any of these required strings is found.
      If the line contains none of the required strings, or contains one of a number of NON-required strings, the line is to be ignored.

      The strings would usually contain more than one word separated by blanks, and the line will usually contain at least one email address wrapped in angle brackets (e.g. )

      The built-in commands FIND and FINDSTR really only handle a single string, without errors, so up to 50 runs against the same input file would be somewhat inefficient!

      Does anyone know of a command-line utility which would do this? (If anyone has a UK IBM mainframe background, what I’m really asking for is a free version of the SELCOPY utility!)

      BATcher, I have a program that I believe does what you are looking for. See the program titled String Search Counter (which is the very first program listed) on my page, http://www.billanddot.com/downloads.htm .

      Cheers,

      Bill P.

    • #1385171

      BATcher,
      I just noticed this

      or contains one of a number of NON-required strings

      The PowerShell code I posted does not currently do this part. Sorry I missed it on 1st reading. I’ll try to get it updated. :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    • #1385180

      BATcher,

      Here’s version 2 that adds the capability to exclude matched records with exclusion strings!

      Code:
      param (
        [string]$ExistingFile = "WindowsUpdate.log",
        [string]$NewFile = "PSProcessed.txt",
        [string]$DriveDirPath  = "G:BEKDocsScripts"
      )
      
      Function ExcludeText([String]$LineToCheck,$ExcludeList) {
         $ListCnt = $ExcludeList.count
         $WriteRecord = "YES"
         For($ExclCnt = 0 ; $ExclCnt -lt $ListCnt ; $ExclCnt++) {
      
            If($LinetoCheck -like $ExcludeList[$ExclCnt]) {
              $WriteRecord = "NO"
              Break
            } #End If
      
         } #End For
      
         $WriteRecord
      
      }  #End ExcludeText
      
      # ******* Setup Section *********
       $LinesProcessed = 0
       $LinesMatched = 0
      
       #  *** Array for strings to MATCH/Select Records ***
       $MatchStrings = @("*Process:*", "*AUSearcher Search*")
      
       #  *** Array for strings to Exclude matched strings!  ***
       $ExcludeStrings=@("*ReScan = FALSE*","*WinAudit.exe*")
       $MatchCnt = $MatchStrings.Count
       remove-item "$DriveDirPath$NewFile" -ErrorVariable Errs 2>$null
      # *******End of Setup Section ********
      
      ForEach( $Line in get-content "$DriveDirPath$ExistingFile" ) {
         $LinesProcessed++
         For( $Cnt = 0 ; $Cnt -lt $MatchCnt; $Cnt++) {
      
           If($Line -like $MatchStrings[$Cnt] ) {
      
             $Result = ExcludeText -LineTocheck $Line -ExcludeList $ExcludeStrings
             If($Result -eq "YES"){ 
               add-content "$DriveDirPath$NewFile" $Line
               $LinesMatched++
               Break
             }
           }  #End If
      
         }  #End For 
      
      }   #End ForEach
      
      Write-Host "$LinesProcessed lines were tested. `n"  `
                 "$LinesMatched lines were matched. `n"  `
                 $Errs.count " errors encountered."

      BTW: I used the WindowsUpdate.log file as my test file I just copied it to the same directory as the script.

      HTH :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

    • #1385286

      OK BATcher the ‘latest & greatest’ Textreme is now uploaded to http://www.jollybean.co.uk and ready to try. Make sure you keep the 3 files (exe, ini and chm) local to one another after extraction.

      The pop-up screen shot shows v2.3 (grr, only just noticed that!) but v2.4 is in the zip file. Because of the website builder I have used (WebPlus X6) I have to recompile the whole page to fix that but I can upload the new zip file no problem. Please let me know how you get on with it!

      Best regards,

      Jim.

      • #1385341

        Hi Jim,

        Could you post in the TEXTreme page,
        a link to the Help file document?
        (so we can read it online,
        w/o needing to d/l the zip file..).

        -or-

        show some concrete examples of using TEXTreme,
        (in either your web page and/or in this Thread)?

        Examples of using the “Move” option,
        would be nice…

        Thanks!
        SF99

        btw:
        I like the name you gave it: TEXTreme

      • #1386485

        You can keep using Rexx on Windows.

        • #1386509

          You can keep using Rexx on Windows.

          And no doubt Xedit too, but I haven’t used either of them for about 10-15 years!

          What I really do want is a port of the IBM PC/DOS “E” editor to a 64-bit environment – it works on 32-bit (with 8.3 filenames), but it’s really a 16-bit program, not surprisingly…

          BATcher

          Plethora means a lot to me.

    • #1385456

      Jim – I don’t have the log files with me at home, but I downloaded Textreme and unzipped it to D:, and tried the .CHM help file – but all of the entries in the LH pane produce nothing in the RH pane! Am I doing something wrong? I note there is a .CHW file, created on first opening the .CHM file, which may or may not be relevant.

      A little later…
      Having just run the .EXE file, it looks VERY promising and (on the face of it!) it seems that it will do all that I want “and more”!
      Do I need to specify strings with quotes, as in “Nigel Molesworth” or leave them out as in Nigel Molesworth ?
      Any problems with the file lines including characters which cause problems in BATch files, like ‘ ” % ^ & and so on?

      BATcher

      Plethora means a lot to me.

      • #1385558

        Okay folks, some replies:

        BATcher – you don’t need quotes (in fact if you DO use them it will treat them as part of the search string!) and the only ‘special characters’ you need to worry about are the square brackets because these are used by the statement parser to detect the parameters. If you need to use square brackets in your parameters then there is a line in the ini file that will temporarily switch them with the characters of your choice (set to the curly brackets by default) so that the parser will not get confused. On the subject of the help file, I seem to recall having to install something from the M$ support area in order for Win7 to be able to read .chm (compiled html) helpfiles (or you might want to try some of the things mentioned here first: http://www.techulator.com/resources/4302-How-open-chm-file-Windows.aspx )

        SF99 – the best example I can offer is from The Register’s Verity Stob where she wanted an editor (Textreme isn’t one, BTW) that could do what she referred to as “munging”. The example she gave was to turn this:

        0001, 3000, Adam, London
        0002, 1000, Bonnie, Manchester
        0003, 2000, Catherine, Edinburgh
        0004, 3000, David, New York
        continues like this for 20 more lines)

        into this:

        db.add(“0001”, “Adam”, “London”, 3000);
        db.add(“0002”, “Bonnie”, “Manchester”, 1000);
        db.add(“0003”, “Catherine”, “Edinburgh”, 2000);
        db.add(“0004”, “David”, “New York”, 3000);

        At the time Textreme could do it all except the shift of the second field to the end which was what prompted me to add the ‘Move’ command.

        Here’s some screenshots of the function list that does it and the “First 25 Lines” window (not sure if I have uploaded/attached them correctly but we shall see!):

        33601-Textreme33602-First25

        Hope this helps, any more questions/problems, either post them here or click the “Send the Jollybean webmaster an email” button on the website.

        Best regards,

        Jim.

    • #1385702

      With regard to a downloaded help .CHM file not producing anything in the Contents pane in Windows 7, the answer is to right-click on the file, choose Properties than click the Unblock button. (Wouldn’t it have been nice to be told this when the problem arises?!)

      Because of assorted disasters, I won’t be able to look at anyone’s kind contributions to the solution of my request until later next week… Apologies!

      BATcher

      Plethora means a lot to me.

    • #1386516

      Ah Rexx and XEdit the Dynamic Duo I remember them well! 😆 :cheers:

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

      • #1387094

        Sorry to interrupt your nostalgia trip folks (you’ll be waxing lyrical over vi next!) but just to let anyone know that might be interested, release 2.5 of Textreme is now available for download.

        Regards,

        Jim

    • #1389158

      Jim,

      I suspect that you are confusing CHM (Compiled HTML) files with HLP (classic Windows Help) files, which they superseded. Windows 7 reads CHM files out of the box. However, if you also need to read Windows Help (HLP) files, you must download and install the Windows Help program, WinHelp32.exe, which is freely available from the Microsoft Product Support Services Web site, at http://www.microsoft.com/en-us/download/details.aspx?id=91.

      David A. Gray

      Designing for the Ages, One Challenge at a Time

      • #1395910

        Um, just curious… If you look at my earlier post in this thread, I noted that a decade or so ago I wrote a program that I believe does what you want to do (if I understand the requirements correctly). Have you downloaded that program and tried to use it?

    • #1396039

      Yes, I did try it, but it wasn’t complete enough for my needs, I’m afraid. Then Textreme came along, and I have been experimenting with it ever since, and the author. Jim Ollerhead, has very kindly created a Command Line version, initially for me! I think his latest version will soon be available.

      BATcher

      Plethora means a lot to me.

      • #1396069

        Yes, I did try it, but it wasn’t complete enough for my needs, I’m afraid. Then Textreme came along, and I have been experimenting with it ever since, and the author. Jim Ollerhead, has very kindly created a Command Line version, initially for me! I think his latest version will soon be available.

        Okay, just checking. I thought the option in my program in which you make a text file of multiple specific targets would fit your needs, but I guess not. Thanks for getting back to me.

    Viewing 17 reply threads
    Reply To: Want command-line utility to search single file for multiple strings

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: