odt export: "Text body" paragraph style is misspelled. #948

peter88213 · 2022-01-02T16:24:24Z

Please be aware that style identifiers in Open/LibreOffice are case sensitive. In exported odt documents, a "Text Body" style is assigned to the normal text paragraphs. LibreOffice 7 recognizes this as a custom paragraph style. The built-in text body style, which is also translated into other languages, is called, correctly written, "Text body". This is important when assigning templates to documents.
I suggest you to correct this in the toodt module.

Apart from that, I find novelWriter quite appealing. The installation under Xubuntu was without problems.

vkbo · 2022-01-02T16:47:22Z

Hmm, in my version of LibreOffice it is called "Text Body" in the GUI but I see now that if I write an .fodt file the display name in the XML is with a lower case "b". I suppose that's the issue you're referring to? I have my OS set to English, and I see no difference in behaviour with an upper or lower case b that I can detect, but I may not be using it the way you do.

I'm happy to take a PR on this if you could correct it and test that it behaves as you expect on your end. The relevant code is in the function def _useableStyles around line 800 in the toodt.py file. I suspect all that is needed is to change the display name setting in that one place, but since I don't know quite how to verify the result, it's a bit tricky to check.

peter88213 · 2022-01-02T18:14:19Z

In the meantime, I cloned the source, installed the missing modules (is lxml really needed? what's wrong with Python3 standard xml.etree?), fixed it locally at line 814 and tested it. This works fine:
oStyle.setDisplayName("Text body")

Unforunately, my Eclipse PyDev PEP8 Auto-formatter removed all your variable alignment, so i guess you would't want a pull request.
For me, it's almost impossible to track changes this way.

Just for information, the odf xml tag is like <text:p text:style-name="Text_20_body">
This is not just about the GUI representation, it is essential for applying document templates, such as with my styleSwitcher extension.

By the way, Happy New Year!

vkbo · 2022-01-02T18:23:18Z

The primary feature of lxml that is used is the "pretty print" feature. The standard Python XML package didn't do that until recently, and indented XML is essential if the novelWriter projects are to be tracked and diffed with git etc. I'm not sure if lxml and internal package are identical with all the messy stuff I do in the toodt class either.

As for autopep, I don't use it. It is far too brutal for most of the code I work on on a daily basis because most people care little for linting. I use flake8 instead, which suggests changes instead.

As for the last point about the style names, those are LibreOffice quirks, aren't they? I never understood what the point of the numbers were, so I saw no point in adding them either. LibreOffice changes them anyway if you save the document again from LibreOffice. It adds a lot of new XML too. I mostly followed the Open Document standard as closely as I could manage.

Edit: Just to clarify, changing to Text_20_body is no issue. I just never saw a reason to add seemingly redundant characters to the labels as I had to type them several places in the code.

vkbo · 2022-01-02T18:31:03Z

Aha, now I get it. _20_ just represents the space character in the display name. As in 0x20. That's a bit obscure, but it makes sense. I guess LibreOffice assumes the display name and style name matches in such a way.

It's not an issue changing this. I've queued this up for the next beta release anyway, which I was planning on doing today but got stuck on some old Fortran code instead.

Also, happy new year to you as well :)

peter88213 · 2022-01-02T18:47:45Z

The primary feature of lxml that is used is the "pretty print" feature.

I see. That's okay, but because I try to avoid dependencies to third-party libs, I myself included a small 'xml pretty printer' snippet by Frederik Lundh, which does the trick perfectly well.

As for the last point about the style names, those are LibreOffice quirks, aren't they?

This applies to OpenOffice as well. I wrote some scripts for yWriter that generate ODT and ODS, but I went the easy route of template-based generation instead of dealing with all the intricacies of the XML format. I just copied bits and pieces from an existing document and nothing could go wrong.

vkbo · 2022-01-02T19:17:45Z

I made a PR #949 that should fix the style names. Could you possibly test it before I merge it?

Anyway, I know it's fairly simple to write this for XML. I've written the reverse code myself for the previous job. It's on my todo list, but not high priority. lxml is a good library and is almost always installed on Linux PCs. I agree with the minimal dependency approach though. I've dropped a number of dependencies in novelWriter since I started. I even had a spell check implementation based on difflib a while back, but having to distribute dictionaries was annoying.

I wrote my own pretty printer for JSON, which will only indent up to a given level. It wastes a lot less space than the default one, and is still diff-friendly.

novelWriter/novelwriter/common.py

Lines 399 to 443 in 79657a9

    
           def jsonEncode(data, n=0, nmax=0): 
        
               """Encode a dictionary, list or tuple as a json object or array, and 
        
               indent from level n up to a max level nmax if nmax is larger than 0. 
        
               """ 
        
               if not isinstance(data, (dict, list, tuple)): 
        
                   return "[]" 
        
               buffer = [] 
        
               indent = "" 
        
               for chunk in json.JSONEncoder().iterencode(data): 
        
                   if chunk == "":  # pragma: no cover 
        
                       # Just a precaution 
        
                       continue 
        
                   first = chunk[0] 
        
                   if chunk in ("{}", "[]"): 
        
                       buffer.append(chunk) 
        
                   elif first in ("{", "["): 
        
                       n += 1 
        
                       indent = "\n"+"  "*n 
        
                       if n > nmax and nmax > 0: 
        
                           buffer.append(chunk) 
        
                       else: 
        
                           buffer.append(chunk[0] + indent + chunk[1:]) 
        
                   elif first in ("}", "]"): 
        
                       n -= 1 
        
                       indent = "\n"+"  "*n 
        
                       if n >= nmax and nmax > 0: 
        
                           buffer.append(chunk) 
        
                       else: 
        
                           buffer.append(indent + chunk) 
        
                   elif first == ",": 
        
                       if n > nmax and nmax > 0: 
        
                           buffer.append(chunk) 
        
                       else: 
        
                           buffer.append(chunk[0] + indent + chunk[1:].lstrip()) 
        
                   else: 
        
                       buffer.append(chunk) 
        
               return "".join(buffer)

peter88213 · 2022-01-02T19:39:46Z

Well, I checked out the odt_libreoffice_friendly branch, started the application and loaded a small dummy project. The text body is now correct. I loaded the result in OpenOffice 3.4.1 and LibreOffice 7.1.8.
I guess, you can also verify it yourself by looking at the paragraph styles sidebar (press F11) in Open/LibreOffice. A file exported by the old build might show the text body style twice, once the original style with other styles such as "first line indent" inherited, and once in the "user defined" section.

Your JSON processor looks very impressive. Hobby programmer that I am, I used the following standard method instead:
json.dumps(jsonData, indent=4, sort_keys=True, ensure_ascii=False)
The indent parameter does the pretty printing.

vkbo · 2022-01-02T19:46:05Z

Thanks for testing it. I did not see duplicate styles in my tree actually, which is why I couldn't reproduce the mentioned issue. Anyway, I'll merge this for 1.6 and do the release tomorrow or when I have the time.

As for the JSON indent, yes, the internal library has it, but it lacks the "indent up to level X" feature that I wanted. I'm considering contributing it to the Python library, but the use case may be too narrow for inclusion.

On the subject of pretty printing XML, Python added it in 3.9. Since I have to support at least 3.7 and 3.8 for some time still, I have considered just coping it over. If so, I can drop lxml. The code is here: https://github.com/python/cpython/blob/863729e9c6f599286f98ec37c8716e982c4ca9dd/Lib/xml/etree/ElementTree.py#L1165

peter88213 · 2022-01-02T19:54:19Z

Thank you for fixing so fast.

On the subject of pretty printing XML, Python added it in 3.9.

That's good news. However, myXubuntu distro still comes with Python 3.8. And since I distribute plain Python scripts, I'm keen to make them run with the lowest Python version possible.

peter88213 · 2022-01-02T20:04:32Z

A little off-topic, but I just had an idea about limiting indentation to a certain level: First format it with the standard method, and then delete the excess leading blanks line by line.

vkbo · 2022-01-02T20:50:45Z

That's good news. However, myXubuntu distro still comes with Python 3.8. And since I distribute plain Python scripts, I'm keen to make them run with the lowest Python version possible.

novelWriter works with 3.6 still, and I don't plan to drop that support until I have to.

vkbo · 2022-03-28T20:36:34Z

Just an update on lxml vs Python xml that we discussed here.

I just tested in a branch to drop lxml by copying over the indent function from the Python 3.9 source to work for older versions. That was all fine. However, I quickly found out that another difference between the two implementations is how they handle namespaces. The core XML for novelWriter doesn't use it, so no problem there, but they are all over the place in the ODT writer class. I think lxml handles them a lot better too. The changes that needed to be done to the ODT writer were too great that I thought it was worth the effort, so I think I'll keep lxml for now.

It was worth a quick try anyway.

peter88213 · 2022-03-29T08:48:10Z

That's interesting. Well, one thing leads to another. If I understand correctly, with the ToOdt class you have a sophisticated document generator that goes deep into the details of the ODT format and builds the XML trees from scratch.
Since you're already on QT, what's wrong with its QTextDocumentWriter? Can't it be connected to the tokenizer?

vkbo · 2022-03-29T09:19:11Z

I wrote the ToOdt class to replace the Qt ODT writer which was used before. The Qt implementation is very basic and is missing a lot of things. It doesn't even write text headers. I also wanted to control the formatting of text paragraphs vs metadata, and control page header/footer formatting.

peter88213 · 2022-03-29T16:25:05Z

I see. However, as a user I would rely on the superior formatting capabilities of OpenOffice anyway, so a simple document with an emphasis on clean structuring and the strict application of pargraph/character styles is enough for me in my programs. My yWriter-to-ODT exporter enters e.g. author and title values as metadata, so a header, as generated by novelWriter export, can be added by page style at any time. Of course, this requires that the users know how to handle Open/LibreOffice.

One thing that is not so easy to do afterwards is the different formatting of the first paragraph after a heading or blank line (text body) and the following paragraphs (first line indent). The document generator takes care of that for me.

peter88213 added the bug Issue: Something isn't working label Jan 2, 2022

vkbo added this to the Release 1.6 Beta 1 milestone Jan 2, 2022

vkbo mentioned this issue Jan 2, 2022

Make ODT export more LibreOffice friendly #949

Merged

6 tasks

vkbo self-assigned this Jan 2, 2022

vkbo closed this as completed in #949 Jan 2, 2022

HeyMyian mentioned this issue Feb 1, 2023

Manuscript Export #622

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

odt export: "Text body" paragraph style is misspelled. #948

odt export: "Text body" paragraph style is misspelled. #948

peter88213 commented Jan 2, 2022

vkbo commented Jan 2, 2022

peter88213 commented Jan 2, 2022

vkbo commented Jan 2, 2022 •

edited

Loading

vkbo commented Jan 2, 2022

peter88213 commented Jan 2, 2022 •

edited

Loading

vkbo commented Jan 2, 2022

peter88213 commented Jan 2, 2022

vkbo commented Jan 2, 2022

peter88213 commented Jan 2, 2022

peter88213 commented Jan 2, 2022

vkbo commented Jan 2, 2022

vkbo commented Mar 28, 2022

peter88213 commented Mar 29, 2022

vkbo commented Mar 29, 2022

peter88213 commented Mar 29, 2022

odt export: "Text body" paragraph style is misspelled. #948

odt export: "Text body" paragraph style is misspelled. #948

Comments

peter88213 commented Jan 2, 2022

vkbo commented Jan 2, 2022

peter88213 commented Jan 2, 2022

vkbo commented Jan 2, 2022 • edited Loading

vkbo commented Jan 2, 2022

peter88213 commented Jan 2, 2022 • edited Loading

vkbo commented Jan 2, 2022

peter88213 commented Jan 2, 2022

vkbo commented Jan 2, 2022

peter88213 commented Jan 2, 2022

peter88213 commented Jan 2, 2022

vkbo commented Jan 2, 2022

vkbo commented Mar 28, 2022

peter88213 commented Mar 29, 2022

vkbo commented Mar 29, 2022

peter88213 commented Mar 29, 2022

vkbo commented Jan 2, 2022 •

edited

Loading

peter88213 commented Jan 2, 2022 •

edited

Loading