aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKonstantin Aladyshev <aladyshev22@gmail.com>2021-07-14 00:39:32 +0300
committerKonstantin Aladyshev <aladyshev22@gmail.com>2021-07-14 00:42:41 +0300
commit418cd8555d3a8963229ccb22ca85c2af4bfb13b2 (patch)
tree2214bae1b1dfb8852130d902928404d62fa5f490
parentc25ac9da7d848f722ee7b9ba6121639480030a82 (diff)
downloadUEFI-Lessons-418cd8555d3a8963229ccb22ca85c2af4bfb13b2.tar.gz
UEFI-Lessons-418cd8555d3a8963229ccb22ca85c2af4bfb13b2.tar.bz2
UEFI-Lessons-418cd8555d3a8963229ccb22ca85c2af4bfb13b2.zip
Add information about UTF-8 parsing
Signed-off-by: Konstantin Aladyshev <aladyshev22@gmail.com>
-rw-r--r--Lessons/Lesson_31/README.md15
1 files changed, 15 insertions, 0 deletions
diff --git a/Lessons/Lesson_31/README.md b/Lessons/Lesson_31/README.md
index fdc22de..16251f7 100644
--- a/Lessons/Lesson_31/README.md
+++ b/Lessons/Lesson_31/README.md
@@ -505,3 +505,18 @@ PCI Root Bridge 2
If you are interested check out this link to know more about all these QEMU parameters https://blogs.oracle.com/linux/post/a-study-of-the-linux-kernel-pci-subsystem-with-qemu
+# UTF-8
+
+In this lesson we've parsed `pci.ids` file as an ASCII file, but actually it is encoded in UTF-8.
+
+```
+$ file ~/UEFI_disk/pci.ids
+/home/kostr/UEFI_disk/pci.ids: UTF-8 Unicode text, with very long lines
+```
+
+But as `pci.ids` file consists mostly from ASCII symbols it was fine to treat it as an ASCII.
+
+We've used this simplification because it is hard to parse UTF-8 data in UEFI since it doesn't have any native support for this encoding.
+
+The only way to parse UTF-8 is to deserialize the UTF-8 to Unicode and then serialize that to UCS-2 (CHAR16). If you are really want to do it, you can utilize some conversion code from the terminal driver (https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Universal/Console/TerminalDxe/Vtutf8.c).
+