Handling large log files

Apr 7, 2011 at 10:54 AM

Hi,

Do you have any plans to handle large log files? i.e. currently you'll get an out of memory exception and the viewer will be torn down.

Thanks,

Kevin

Developer
Apr 8, 2011 at 4:49 AM

Hi Kevin

Currently we are stabilizing the code base bringing in certain "must  have's". When you say large ? How large are we talking about ?

Developer
Jun 1, 2012 at 3:07 PM
Edited Jun 1, 2012 at 4:37 PM

I found this same original project and have been modifying it the past day to account for this very issue.

My log files are upwards of 500MB! 

The biggest problem it seems is the need to read the whole file as a string, nest the <log4j:EventName> in a <root> element, and then read with the XmlTextReader.

I've tried MANY different methods for trying to make this work, but it seems impossible.

For now, I've committed to getting rid of the read-as-string-first code and simply append the root node manually before opening with TextPad or notepad...

 

using (var fs = new FileStream(...))
{
    using (var bs = new BufferedStream(fs))
    {
        using (var sr = new StreamReader(bs))
        {
            var xmlTR = new XmlTextReader(sr)
            while (xmlTR.Read())
            {
               // ...
            }
        }
    }
}

I'd love a better solution. 

Another thing I wanted to add was a progress indicator and keep it from blocking the UI. 

Developer
Jun 1, 2012 at 3:12 PM
Edited Jun 1, 2012 at 4:37 PM

After briefly looking at this updated source, the new ability to merge-add logs is perhaps a suitable solution to this problem.

I guess the idea is rather than reading a single large log file, configure the application to chunk the files by a reasonable size, then read them merged?

Developer
Jun 4, 2012 at 3:02 PM
Edited Jun 4, 2012 at 3:03 PM

If the large file can be assumed to have a root node, then the XmlTextReader can be used to parse any size file.

Rather than read the entire file as a string, append the root node in-memory, then read that in-memory string as XML, we could instead save the root node with the file.

Open the file, check the first block of characters to see if it has a <root> node .. If it does, close file and let the XmlTextReader do its thing .. (maybe check that it has an end node too, just to be safe?)

If there is no root node, then add it to the beginning & end of the file without reading the whole thing, save it, then read as normal.

Thoughts?

Coordinator
Jun 25, 2012 at 2:46 PM

The way the app is reading and parsing XML files can be a problem with very big files (500 MB ! that's a log !).

Some improvements can be done, surely. But this is not possible to write something in the original files. The app must deal with files that have their own life cycle. For example, I often use the app to read a log file while it is used by its own app that can still write lines. This is why there is a "refresh" function : to reload files that have changed since they have been loaded.

Writing some header/footer to a log can break any addition to it. So this is not a valid solution. The app must add what is needed without modifying the original files.

One can thing then about a cache folder where modified files are stored along with a CRC, if the user reopens the same file and CRC has not changed, then the cache copy is opened instead of the original. If the file has changed then it is modified and stored in the cache folder.

This can be a solution, even though the cache can become very big... (mainly with 500 MB logs !). But this can help with most middle size logs. (small logs are not a problem).

The part that is parsing the xml files should be isolated in a Task to let the UI be reactive. A progression bar will then be something cool.

I also think we can work on a way to parse the file using paralell Linq. This can greatly shorten file loading on most computers (mine has 8 cores, but it is very common to see 2 or 4 cores).

Now, the file size can be a problem. The example of 500MB log is not a "normal condition" for the application at this time. The question is "must we do something to allow such big files, if yes, what, and if not, how to prevent crashes ? (adding a test on file size before parsing ? looking at free memory ? ...).

what do you think about these problems?