diff options
author | Konstantin Aladyshev <aladyshev22@gmail.com> | 2021-07-14 00:39:32 +0300 |
---|---|---|
committer | Konstantin Aladyshev <aladyshev22@gmail.com> | 2021-07-14 00:42:41 +0300 |
commit | 418cd8555d3a8963229ccb22ca85c2af4bfb13b2 (patch) | |
tree | 2214bae1b1dfb8852130d902928404d62fa5f490 /Lessons/Lesson_31/README.md | |
parent | c25ac9da7d848f722ee7b9ba6121639480030a82 (diff) | |
download | UEFI-Lessons-418cd8555d3a8963229ccb22ca85c2af4bfb13b2.tar.gz UEFI-Lessons-418cd8555d3a8963229ccb22ca85c2af4bfb13b2.tar.bz2 UEFI-Lessons-418cd8555d3a8963229ccb22ca85c2af4bfb13b2.zip |
Add information about UTF-8 parsing
Signed-off-by: Konstantin Aladyshev <aladyshev22@gmail.com>
Diffstat (limited to 'Lessons/Lesson_31/README.md')
-rw-r--r-- | Lessons/Lesson_31/README.md | 15 |
1 files changed, 15 insertions, 0 deletions
diff --git a/Lessons/Lesson_31/README.md b/Lessons/Lesson_31/README.md index fdc22de..16251f7 100644 --- a/Lessons/Lesson_31/README.md +++ b/Lessons/Lesson_31/README.md @@ -505,3 +505,18 @@ PCI Root Bridge 2 If you are interested check out this link to know more about all these QEMU parameters https://blogs.oracle.com/linux/post/a-study-of-the-linux-kernel-pci-subsystem-with-qemu +# UTF-8 + +In this lesson we've parsed `pci.ids` file as an ASCII file, but actually it is encoded in UTF-8. + +``` +$ file ~/UEFI_disk/pci.ids +/home/kostr/UEFI_disk/pci.ids: UTF-8 Unicode text, with very long lines +``` + +But as `pci.ids` file consists mostly from ASCII symbols it was fine to treat it as an ASCII. + +We've used this simplification because it is hard to parse UTF-8 data in UEFI since it doesn't have any native support for this encoding. + +The only way to parse UTF-8 is to deserialize the UTF-8 to Unicode and then serialize that to UCS-2 (CHAR16). If you are really want to do it, you can utilize some conversion code from the terminal driver (https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Universal/Console/TerminalDxe/Vtutf8.c). + |