multimarkdown generates invalid XML from smart quotes and dashes

rich.miller.6's Avatar

rich.miller.6

10 Oct, 2015 04:53 AM

Multimarkdown version 4.7.1 on Windows takes the attached 4-line file (which was prepared by cut-and-paste from Google Docs, then worked on in Scrivener) and produces invalid XML. Firefox points precisely to the first apostrophe as the offending character.

Multimarkdown should presumably never produce invalid XML. For characters like the ones in my example, it should preferably do "the right thing", or give a precise description of the problem so it can be easily fixed.

It is also possible that Scrivener should process the text so that this problem doesn't show up. But whether Scrivener is changed or not, multimarkdown is still generating broken XML from a simple input file.

Best regards,
Rich Mauritz-Miller

  1. Support Staff 1 Posted by fletcher on 11 Oct, 2015 06:51 PM

    fletcher's Avatar

    What output are you getting? It works fine for me, and the HTML validates at https://validator.w3.org/ when used in a complete HTML document.

    (There's no such thing as valid or invalid XML/HTML without being a complete document. By itself, the text you sent is not a complete document.)

    F-

    --
    Fletcher T. Penney
    [email blocked]

  2. 2 Posted by rich.miller.6 on 11 Oct, 2015 08:47 PM

    rich.miller.6's Avatar

    Hi Fletcher,

    Thanks for getting back to me! I see that I created an imcomplete bug
    report. The problem occurs when I use the "mmd2odf" batch file to convert
    the four-line file I sent you into an .fodt file. LibreOffice refuses to
    open the resulting file (attached, with a different name of 'test.fodt').

    Thanks for pointing me to the XML validator. Renaming test.fodt to test.xml
    and attempting to validate it as XML generates this message from the
    validator:

    Missing "charset" attribute for "text/xml" document.

    And on line 51, there is a character that isn't in the us-ascii character
    set.

    So... is this a problem with the header information that Scrivener is
    generating - or does MMD do this? Or is Scrivener outputting characters in
    the body of the text that it should convert to something else? Or is MMD
    not noticing the presence of input characters that are not in the assumed
    character set, and thereby generating an invalid document?

    Thanks for your help with this - much appreciated!

    Best regards,
    Rich

    Rich Mauritz-Miller

  3. Support Staff 3 Posted by fletcher on 11 Oct, 2015 10:41 PM

    fletcher's Avatar

    The problem is apparently the file's encoding. Save your text files as UTF-8, and that should allow proper output when processed by MMD.

    When I converted the file to UTF-8 on my mac, it then works just fine.

    F-

    --
    Fletcher T. Penney
    [email blocked]

  4. 4 Posted by rich.miller.6 on 11 Oct, 2015 11:11 PM

    rich.miller.6's Avatar

    Thanks, Fletcher - I appreciate your help.

    I noticed that MultiMarkdown is written in Perl. I also noticed that MMD
    doesn't have a lot of bugs in the bug database. Coincidence? :-)

    A quick/naive question: how do I convert my files to UTF-8 format, as you
    just did?

    I will forward this information to the Scrivener folks.

    Finally, I'm sending you a LinkedIn invitation. It seems we're both
    interesting in both Comp. Sci. and healthcare.

    Best regards,
    Rich

  5. Support Staff 5 Posted by fletcher on 11 Oct, 2015 11:23 PM

    fletcher's Avatar

    The old MMD was in Perl -- Markdown was in Perl, and MMD started as a fork of Markdown.

    MMD v3 and v4 are both in C.

    There aren't a lot of active bugs because the project is old (original MMD was 11 years ago or so), and I've built up some decent test suites over the years.

    But mostly because a large number of users help me find bugs and fix them pretty quickly, so they don't sit around for too long.

    Any good general text editor should be able to convert encoding. The problem is getting the encoding read properly when opening the file.

    On my Mac, TextWrangler could not interpret the "special" characters. MultiMarkdown Composer opens the file and displays as "chinese" characters. Sublime Text opened it properly, and then easily saved as UTF-8. I know Sublime Text has a single license that is good on Mac, Linux, and Windows. It's a solid app and way more powerful than I truly take advantage of. But it works well.

    Of course, the best approach is to save using the proper encoding to begin with…. ;) Usually in the "Save As" dialog there are options to specify encoding in any decent text editor. Not sure what Scrivener does. Don't remember off-hand, but I *think* Notepad can do it.

    F-

    --
    Fletcher T. Penney
    [email blocked]

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Already uploaded files

  • badchars.mmd 138 Bytes

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac

Recent Discussions

19 Feb, 2018 10:29 PM
17 Nov, 2017 03:20 PM
19 Sep, 2017 02:19 PM
06 Sep, 2017 07:18 AM
31 Jul, 2017 11:21 PM

Recent Articles