Completely hangs with a "large" file

Chuck's Avatar

Chuck

31 Dec, 2016 07:15 AM

Hi,

I've got a file that is 571 lines, 26277 characters total. If I try to run it through multimarkdown it simply hangs. If I chop 200 lines off the front, or 200 lines off the back, the remaining lines process fine. My guess is that it is breaking the parser somehow (that is where the executable gets stuck). I tried to enable debugging in the parser by #defining YY_DEBUG but those frprintf's out to stderr are not appearing on standard error for some reason? How can I debug this?

--Chuck

  1. 1 Posted by Chuck on 31 Dec, 2016 07:31 AM

    Chuck's Avatar

    FWIW, looking through the code it looks this symptom presents like the 'endless loop' bug that is commented about in process_raw_block

    --Chuck

  2. Support Staff 2 Posted by fletcher on 31 Dec, 2016 11:59 AM

    fletcher's Avatar

    Chuck,

    Happy new year!

    That actually isn't that big of a file, so size itself isn't the problem.

    If you send me the file I can take a look and see if there's anything peculiar about the structure and run MMD through a debugger and see where it's getting hung up.

    It's been a long time since I found a file that caused MultiMarkdown to hang. Which version are you using?

    Fletcher

    Sent from my iPhone

  3. 3 Posted by Chuck McManis on 31 Dec, 2016 06:58 PM

    Chuck McManis's Avatar

    Hi Fletcher, file is attached

    I managed to get yyparse debugging enabled and have a13G dump of the output
    of yyparse when I'm going to crawl through if something doesn't pop out to
    you. I'm using the latest version, I pulled it from github and rebuilt from
    source.

    --Chuck

  4. Support Staff 4 Posted by fletcher on 01 Jan, 2017 12:34 AM

    fletcher's Avatar

    Correct me if I'm misreading the file, but it's not valid HTML, correct?
      I count one extra <div> for every "chunk" compared to </div>'s.

    FTP

  5. 5 Posted by Chuck McManis on 01 Jan, 2017 01:09 AM

    Chuck McManis's Avatar

    That it isn't valid HTML is certainly possible, it is the penultimate step
    before my CMS system considers is ready to publish. Once the markdown in
    the file has been process it is the contents value of another template
    which puts things around it. Looking at the enclosing template though it
    shouldn't have unmatched div's. I'll check that in my CMS code. I also
    spent a lot of time bisecting the file to figure out if there was a line
    that hung the parser (and eventually discovered that bisecting it anywhere
    and both "halves" of the result would process.

    --Chuck

  6. Support Staff 6 Posted by fletcher on 01 Jan, 2017 01:33 AM

    fletcher's Avatar

    Probably by cutting the number of divs in half, the problem of matching openers and closers becomes more tractable.

    PEGs don't do well in processing "almost but not quite" properly formed text like this. ;)

    Sent from my iPhone

  7. 7 Posted by Chuck McManis on 01 Jan, 2017 06:31 AM

    Chuck McManis's Avatar

    Hi Fletcher,

    Thanks for the clue! I went back into my site generation code and found the
    template that was missing the closing div. I fixed that and the site
    published as expected. (and the index page went back to being legal html at
    that point). So it seems like the issue was too many open <div> tags. That
    said, I wonder if multimarkdown could catch that and error out rather than
    hang forever?

    --Chuck

  8. Support Staff 8 Posted by fletcher on 01 Jan, 2017 12:22 PM

    fletcher's Avatar

    The problem is that the way PEGs work is that it keeps trying to find matched pairs of opening and closing tags that would represent HTML blocks. It "knows" that each opener should have a closer but struggles to figure out how to pair them up when the numbers are wrong. It has to check the entire rest of the document because it's possible there are a bunch of closers all together at the end, but it never finds them. And it has to do that every time it finds a new opener. And you have a lot of openers in your document.

    I'm working on MultiMarkdown 6, which works in an entirely different way and should be ok on this. But it's not ready yet.

    Ftp

    Sent from my iPhone

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac

Recent Discussions

08 May, 2017 07:08 PM
24 Mar, 2017 07:32 PM
24 Mar, 2017 07:16 PM
24 Mar, 2017 06:37 PM
20 Mar, 2017 02:10 PM