Re: How to read in a (long) UTF-8 file, incrementally?
From
=?UTF-8?Q?Niocl=C3=A1is=C3=ADn_C=C3=B3il=C3=ADn_de_Ghlost=C3=A9ir?=@Spamassassin@irrt.De to
comp.lang.ada on Sun Aug 24 22:57:38 2025
From Newsgroup: comp.lang.ada
Doctor Marius Amado-Alves wrote on 2nd November 2021: |----------------------------------------------------------------------------| |"As I understand it, to work with Unicode text inside the program it is | |better to use the Wide_Wide (UTF-32) variants of everything. | | | |Now, Unicode files usually are in UTF-8. | | | |One solution is to read the entire file in one gulp to a String, then | |convert to Wide_Wide. This solution is not memory efficient, and it may not | |be possible in some tasks e.g. real time processing of lines of text. | | | |If the files has lines, I guess we can also work line by line (Text_IO). But| |the text may not have lines. Can be a long XML object, for example. | | | |So it should be possible to read a single UTF-8 character, right? Which | |might be 1, 2, 3, or 4 bytes long, so it must be read into a String, right? | |Or directly to Wide_Wide. Are there such functions? | | | |Thanks a lot." | |----------------------------------------------------------------------------|
Timings can be significantly affected when a thing has an unknown varying quantity of octets.
--- Synchronet 3.21a-Linux NewsLink 1.2