• First two bytes of 'stdout' are lost

    From Olivier B.@perso.olivier.barthelemy@gmail.com to comp.lang.python on Thu Apr 11 14:42:22 2024
    From Newsgroup: comp.lang.python

    I am trying to use StringIO to capture stdout, in code that looks like this:

    import sys
    from io import StringIO
    old_stdout = sys.stdout
    sys.stdout = mystdout = StringIO()
    print( "patate")
    mystdout.seek(0)
    sys.stdout = old_stdout
    print(mystdout.read())

    Well, it is not exactly like this, since this works properly

    This code is actually run from C++ using the C Python API.
    This worked quite well, so the code was right at some point. But now,
    two things changed:
    - Now using python 3.11.7 instead of 3.7.12
    - Now using only the python limited C API

    And it seems that now, mystdout.read() always misses the first two
    characters that have been written to stdout.

    My first ideas was something related to the BOM improperly truncated
    at some point, but i am manipulating UTF-8, so the bom would be 3
    bytes, not 2.

    I ruled out wrong C++ code to extract the string from the python
    variable, since running a python print of the content of mystdout in
    the real stdout also misses the two first characters.

    Hopefully someone has a clue on what would have changed in Python for
    this to stop working compared to python 3.7?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Olivier B.@perso.olivier.barthelemy@gmail.com to comp.lang.python on Thu Apr 11 15:09:32 2024
    From Newsgroup: comp.lang.python

    Partly answering myself:
    For some reason, right after mystdout has been created, i now have to
    do mystdout.seek(0) and this solves the issue.
    No idea why though..
    Le jeu. 11 avr. 2024 à 14:42, Olivier B.
    <perso.olivier.barthelemy@gmail.com> a écrit :

    I am trying to use StringIO to capture stdout, in code that looks like this:

    import sys
    from io import StringIO
    old_stdout = sys.stdout
    sys.stdout = mystdout = StringIO()
    print( "patate")
    mystdout.seek(0)
    sys.stdout = old_stdout
    print(mystdout.read())

    Well, it is not exactly like this, since this works properly

    This code is actually run from C++ using the C Python API.
    This worked quite well, so the code was right at some point. But now,
    two things changed:
    - Now using python 3.11.7 instead of 3.7.12
    - Now using only the python limited C API

    And it seems that now, mystdout.read() always misses the first two
    characters that have been written to stdout.

    My first ideas was something related to the BOM improperly truncated
    at some point, but i am manipulating UTF-8, so the bom would be 3
    bytes, not 2.

    I ruled out wrong C++ code to extract the string from the python
    variable, since running a python print of the content of mystdout in
    the real stdout also misses the two first characters.

    Hopefully someone has a clue on what would have changed in Python for
    this to stop working compared to python 3.7?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Thomas Passin@list1@tompassin.net to comp.lang.python on Thu Apr 11 09:19:09 2024
    From Newsgroup: comp.lang.python

    On 4/11/2024 8:42 AM, Olivier B. via Python-list wrote:
    I am trying to use StringIO to capture stdout, in code that looks like this:

    import sys
    from io import StringIO
    old_stdout = sys.stdout
    sys.stdout = mystdout = StringIO()
    print( "patate")
    mystdout.seek(0)
    sys.stdout = old_stdout
    print(mystdout.read())

    Well, it is not exactly like this, since this works properly

    This code is actually run from C++ using the C Python API.
    This worked quite well, so the code was right at some point. But now,
    two things changed:
    - Now using python 3.11.7 instead of 3.7.12
    - Now using only the python limited C API

    And it seems that now, mystdout.read() always misses the first two
    characters that have been written to stdout.

    My first ideas was something related to the BOM improperly truncated
    at some point, but i am manipulating UTF-8, so the bom would be 3
    bytes, not 2.

    I ruled out wrong C++ code to extract the string from the python
    variable, since running a python print of the content of mystdout in
    the real stdout also misses the two first characters.

    Hopefully someone has a clue on what would have changed in Python for
    this to stop working compared to python 3.7?

    I've not used the C API, so just for fun I asked ChatGPT about this and
    it suggested that a flush after writing to StringIO might do it. It
    suggested using a custom class for this purpose:

    class MyStringIO(StringIO):
    def write(self, s):
    # Override write method to ensure all characters are written correctly
    super().write(s)
    self.flush()

    You would use it like this:

    sys.stdout = mystdout = MyStringIO()

    I haven't tested it but it seems reasonable, although I would have
    naively expected to lose bytes from the end, not the beginning.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Cameron Simpson@cs@cskk.id.au to comp.lang.python on Fri Apr 12 07:55:55 2024
    From Newsgroup: comp.lang.python

    On 11Apr2024 14:42, Olivier B. <perso.olivier.barthelemy@gmail.com> wrote:
    I am trying to use StringIO to capture stdout, in code that looks like this:

    import sys
    from io import StringIO
    old_stdout = sys.stdout
    sys.stdout = mystdout = StringIO()
    print( "patate")
    mystdout.seek(0)
    sys.stdout = old_stdout
    print(mystdout.read())

    Well, it is not exactly like this, since this works properly

    Aye, I just tried that. All good.

    This code is actually run from C++ using the C Python API.
    This worked quite well, so the code was right at some point. But now,
    two things changed:
    - Now using python 3.11.7 instead of 3.7.12
    - Now using only the python limited C API

    Maybe you should post the code then: the exact Python code and the exact
    C++ code.

    And it seems that now, mystdout.read() always misses the first two
    characters that have been written to stdout.

    My first ideas was something related to the BOM improperly truncated
    at some point, but i am manipulating UTF-8, so the bom would be 3
    bytes, not 2.

    I didn't think UTF-8 needed a BOM. Somone will doubtless correct me.

    However, does the `mystdout.read()` code _know_ you're using UTF-8? I
    have the vague impression that eg some Windows systems default to UTF-16
    of some flavour, possibly _with_ a BOM.

    I'm suggesting that you rigorously check that the bytes->text bits know
    what text encoding they're using. If you've left an encoding out
    anywhere, put it in explicitly.

    Hopefully someone has a clue on what would have changed in Python for
    this to stop working compared to python 3.7?

    None at all, alas. My experience with the Python C API is very limited.
    --- Synchronet 3.20a-Linux NewsLink 1.114