I’ve got an object with a short string attribute, and a long multi-line string attribute. I want to write the short string as a YAML quoted scalar, and the multi-line string as a literal scalar:
my_obj.short = "Hello" my_obj.long = "Line1nLine2nLine3"
I’d like the YAML to look like this:
short: "Hello" long: | Line1 Line2 Line3
How can I instruct PyYAML to do this? If I call yaml.dump(my_obj), it produces a dict-like output:
{long: 'line1
line2
line3
', short: Hello}
(Not sure why long is double-spaced like that…)
Can I dictate to PyYAML how to treat my attributes? I’d like to affect both the order and style.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Falling in love with @lbt’s approach, I got this code:
import yaml
def str_presenter(dumper, data):
if len(data.splitlines()) > 1: # check for multiline string
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
yaml.add_representer(str, str_presenter)
# to use with safe_dump:
yaml.representer.SafeRepresenter.add_representer(str, str_presenter)
It makes every multiline string be a block literal.
I was trying to avoid the monkey patching part.
Full credit to @lbt and @J.F.Sebastian.
Method 2
import yaml
from collections import OrderedDict
class quoted(str):
pass
def quoted_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)
class literal(str):
pass
def literal_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)
def ordered_dict_presenter(dumper, data):
return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)
d = OrderedDict(short=quoted("Hello"), long=literal("Line1nLine2nLine3n"))
print(yaml.dump(d))
Output
short: "Hello" long: | Line1 Line2 Line3
Method 3
I wanted any input with a n in it to be a block literal. Using the code in yaml/representer.py as a base I got:
# -*- coding: utf-8 -*-
import yaml
def should_use_block(value):
for c in u"u000au000du001cu001du001eu0085u2028u2029":
if c in value:
return True
return False
def my_represent_scalar(self, tag, value, style=None):
if style is None:
if should_use_block(value):
style='|'
else:
style = self.default_style
node = yaml.representer.ScalarNode(tag, value, style=style)
if self.alias_key is not None:
self.represented_objects[self.alias_key] = node
return node
a={'short': "Hello", 'multiline': """Line1
Line2
Line3
""", 'multiline-unicode': u"""Lêne1
Lêne2
Lêne3
"""}
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
yaml.representer.BaseRepresenter.represent_scalar = my_represent_scalar
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
Output
{multiline: 'Line1
Line2
Line3
', multiline-unicode: "LxEAne1nLxEAne2nLxEAne3n", short: Hello}
{multiline: 'Line1
Line2
Line3
', multiline-unicode: 'Lêne1
Lêne2
Lêne3
', short: Hello}
After override
multiline: |
Line1
Line2
Line3
multiline-unicode: "LxEAne1nLxEAne2nLxEAne3n"
short: Hello
multiline: |
Line1
Line2
Line3
multiline-unicode: |
Lêne1
Lêne2
Lêne3
short: Hello
Method 4
You can use ruamel.yaml and its RoundTripLoader/Dumper (disclaimer: I am the author of that package) apart from doing what you want, it supports the YAML 1.2 specification (from 2009), and has several other improvements:
import sys from ruamel.yaml import YAML yaml_str = """ short: "Hello" # does keep the quotes, but need to tell the loader long: | Line1 Line2 Line3 folded: > some like explicit folding of scalars for readability """ yaml = YAML() yaml.preserve_quotes = True data = yaml.load(yaml_str) yaml.dump(data, sys.stdout)
gives:
short: "Hello" # does keep the quotes, but need to tell the loader long: | Line1 Line2 Line3 folded: > some like explicit folding of scalars for readability
(including the comment, starting in the same column as before)
You can also create this output starting from scratch, but then you
do need to provide the extra information e.g. the explicit positions on where to fold.
Method 5
It’s worth noting that pyyaml disallows trailing spaces in block scalars and will force content into double-quoted format. It seems a lot of folk have run into this issue. If you don’t care about being able to round-trip the data, this will strip out those trailing spaces:
def str_presenter(dumper, data):
if len(data.splitlines()) > 1 or 'n' in data:
text_list = [line.rstrip() for line in data.splitlines()]
fixed_data = "n".join(text_list)
return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
yaml.add_representer(str, str_presenter)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0