FO.NET does not yet support Unicode characters

Apr 30, 2009 at 5:48 AM
In FAQ it says:

Q: Does FO.NET support Unicode? For example, can I use special characters like ohmega?
A: FO.NET can handle any code point supported by Unicode and by the current font in use.

But unfortunately this is not true. No CJK (Chinese-Japanese-Korean) characters are working. They are displayed as either an empty space or a '#'.

I actually tried to debugged very deap into the source code, and finally end up with this comment in TrueTypeFont.cs, MapCharacter:

// TrueType fonts only support the Basic and Extended Latin blocks

And the TrueType font encoding is hardcoded to "WinAnsiEncoding", which I think it's not going to work.

Coordinator
Jun 8, 2009 at 5:20 PM

Hi there

Are you specifying a font that contains the CJK glyphs and have you instructed FO.NET to either subset (preferred) or embed the font?  This should would.  It is possible to show any unicode character in FO.NET - if this does not help then I will create a sample for you.

Cheers

Mark

Nov 11, 2009 at 1:20 PM

Hi Mark,

Following up on the above, I am trying to write out a pdf file in Latvian and have the same problem as described by llyzs. Can you send me the sample or post it?

Thanks, Steve

Mar 3, 2010 at 2:25 AM

Can you send me the sample,I need using Chinese characters

Thanks

Mar 4, 2010 at 1:12 PM

Hi Zhuibobo,

Hope this helps:

    // Load the XSL-FO document
    XmlDocument fo = (XmlDocument)...;

    // Generate the PDF using Apoc
    PdfRendererOptions options = new PdfRendererOptions();
    options.FontType = FontType.Subset;
    options.Kerning = false;

    FileInfo pdfFile = new FileInfo( Server.MapPath( "xxx.pdf" ) );
    FileStream pdfStream = pdfFile.OpenWrite();

    ApocDriver driver = ApocDriver.Make();
    driver.Options = options;
    driver.Render(fo, pdfStream);
    pdfStream.Close();
    pdfStream.Dispose();

My previous mistake was missing out the crucial line driver.Options = options;

Dec 1, 2010 at 1:09 AM

Hi, I've been having issues with unicode (utf-8 to be more precise) in FO.NET as well. If I try and enter any non-english characters they still won't process correctly... Also, I've had to modify your code as the ApocDriver is not available... I assumed you meant to use the FonetDriver.

 

Here's my modificationi of your helloworld:

 

using System;
using System.IO;
using System.Data;
using System.Xml;
using Fonet;
using Fonet.Render;
using Fonet.Render.Pdf;

namespace FonetExample {
    class HelloWorld {
        private static string pdfPath;
        private static string pdfFilename;
        private static long maxFiles;

        static void pdfFileInit(string filename, string path = null) {
            long i;

            pdfFilename = filename;
            pdfPath = String.Format("{0}{1}PDF", path??Directory.GetCurrentDirectory(), Path.DirectorySeparatorChar);
            if (!Directory.Exists(pdfPath)) {
                Directory.CreateDirectory(pdfPath);
            }
            pdfPath += Path.DirectorySeparatorChar;
            pdfFilename = String.Format("{0}.pdf", filename);

            if (File.Exists(pdfPath + pdfFilename)) {
                for (i = 0; i < maxFiles; i++) {
                    if (File.Exists(pdfPath + pdfFilename)) {
                        pdfFilename = String.Format("{0}({1}).pdf", filename, i);
                    } else {
                        break;
                    }
                }

                if (i >= maxFiles) {
                    throw new FileLoadException("Too many files in the directory. Aborting.");
                }
            }
        }

        static void Main(string[] args) {
            pdfFileInit("hello""../..");

            // Initial HelloWorld
            //FonetDriver driver = FonetDriver.Make();
            //driver.Render("../../hello.fo", "../../hello.pdf");

            // Internationalisation
            // Load the XSL-FO document
            XmlDocument fo = new XmlDocument();
            fo.Load("../../hello.fo");

            // Generate the PDF using Apoc
            PdfRendererOptions options = new PdfRendererOptions();
            options.FontType = FontType.Subset;
            options.Kerning = false;
            
            FileInfo pdfFile = new FileInfo( pdfPath + pdfFilename );
            FileStream pdfStream = pdfFile.OpenWrite();

            FonetDriver driver = FonetDriver.Make();
            driver.Options = options;
            driver.Render(fo, pdfStream);
            pdfStream.Close();
            pdfStream.Dispose();
        }
    }
}

I'm investigating why FO.NET is having this issue and am arriving at a similar conclusion as llyzs has.
That "WinAnsiEncoding" is hardcoaded throughout the calls to the CodePointMapping.GetMapping(...) methods

But my issue has roots in the MapCharacter() method in the Base14Font.cs class. It's as if the other provided
mappings aren't being used at all and only "WinAnsiEncoding" is being called...

Thanks for your help so far and I hope you can help us to resolve this...
Dec 1, 2010 at 1:53 AM
I've modified the FontState.cs method below to simply try all of the encodings and see if any of them return something non-zero. None of them did for my test string of "Здраво свете!" 
which is in Cyrillic script.

Modified method:
public ushort MapCharacter(char c)
{
    /*if (metric is Font)
    {
        return ((Font)metric).MapCharacter(c);
    }*/

    ushort charIndex = CodePointMapping.GetMapping("WinAnsiEncoding").MapCharacter(c);
    if (charIndex != 0)
    {
        return charIndex;
    }
    else
    {
        System.Collections.Generic.List<string> encodingsToTry = new System.Collections.Generic.List<string>();
        System.Collections.Generic.List<ushort> charsGotten = new System.Collections.Generic.List<ushort>();

        encodingsToTry.Add("StandardEncoding");
        encodingsToTry.Add("ISOLatin1Encoding");
        encodingsToTry.Add("CEEncoding");
        encodingsToTry.Add("MacRomanEncoding");
        encodingsToTry.Add("WinAnsiEncoding");
        encodingsToTry.Add("PDFDocEncoding");
        encodingsToTry.Add("SymbolEncoding");
        encodingsToTry.Add("ZapfDingbatsEncoding");

       foreach (string enc in encodingsToTry) {
            charsGotten.Add(CodePointMapping.GetMapping(enc).MapCharacter(c));
        }

        foreach (ushort ch in charsGotten) {
            if (ch != 0) {
                return ch;
            }
        }

        return (ushort)'#';
    }
}
Coordinator
Dec 1, 2010 at 9:24 AM

Hi

Are you specifying a TrueType font in the FO file?  If you could post your FO file as well then I will take a look.

Mark

Dec 1, 2010 at 9:15 PM

Here it is:

<?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

  <fo:layout-master-set>
    <fo:simple-page-master master-name="simple"
                  page-height="29.7cm"
                  page-width="21cm"
                  margin-top="1cm"
                  margin-bottom="2cm"
                  margin-left="2.5cm"
                  margin-right="2.5cm">
      <fo:region-body margin-top="3cm"/>
      <fo:region-before extent="3cm"/>
      <fo:region-after extent="1.5cm"/>
    </fo:simple-page-master>
  </fo:layout-master-set>

  <fo:page-sequence master-reference="simple">
    <fo:flow flow-name="xsl-region-body">
      <fo:block font-size="18pt" color="black" text-align="center">
        Hello, World! Здраво свете!
      </fo:block>
    </fo:flow>
  </fo:page-sequence>

</fo:root>


Filename was "hello.fo".

Thanx again.
Coordinator
Dec 3, 2010 at 11:31 AM

Hi

The problem is that you are not specifying a font that can handle those characters.  If no font is specified, then FO.NET will default to Helivetica and you are restricted to codepage 1252.  The same behaviour happens if you link rather than embed your font (see: http://fonet.codeplex.com/wikipage?title=Font%20Linking%2c%20Embedding%20and%20Subsetting&referringTitle=Section%204%3a%20Font%20Support).  I can't remember the exact details, but I remember this was done to ensure we comply with the PDF specification v1.4.

If you back-out your modifications and then retry specifying font-family="Arial" on the <fo:block/> then it will work OK.

Let me know if you still have problems.

Mark

Coordinator
Dec 4, 2010 at 1:19 PM

Hi

I had a spare few minutes this weekend and tried this out.   I added font-family="Arial" to the <fo:block/> tag and rendered the PDF using the command line tool (which essentially does the same as your code).  The command line I used was:

fonet -fonttype Subset -fo test.fo -pdf test.pdf

The resulting PDF displayed as expected.  This makes sense if you think about it.  For the PDF to be truly portable, then you must embed the font if you use characters that are not natively handled by the PDF reader (or cannot be easily mapped to characters natively handled by the PDF reader).

In a nutshell, you need to either subset or embed the font.

Hope this helps

Mark

Dec 7, 2010 at 2:05 AM

Thanx! That worked with that small sample program.

My last question would then be:

does that mean that I cannot use Havletica font? How would I use the system fonts instead of the builtin ones?

 

You've helped me quite a bit already. Thanks

 

 

Coordinator
Dec 7, 2010 at 7:36 AM

Hi

You can't use the Helvetica font that is built into PDF readers, since the PDF specification only supports certain character sets.  See Appendix D in the PDF specification v1.4 (I have not studied later versions of the specification so I am not sure if the character sets were expanded in later versions.  In any case, FO.NET supports v1.4 only).

If you have a system font called "Helvetica", then FO.NET will ignore it when enumerating the available fonts (see FontSetup.cs).  We did not ever need to change this behaviour since we tended to use Arial as a replacement for Helvetica (IIRC, they are virtually identical but I could be wrong).

If using Arial is not an option, then you could change the name of the built-in PDF Helvetica font (the built-in fonts are known as Base 14 fonts within FO.NET).  This would involve minor changes to the Helvetica*.cs files and FontSetup.cs file.

Good luck!

Mark


 

Feb 16, 2011 at 12:06 PM

Hi Mark,

Had to come back to this as I still haven't got to the bottom of it. The example above works well for Cyrillic, but not for Chinese. I tried the text:

Hello, World! Здраво свете! 你好世界!

but the Chinese comes out as rectangular blocks (the Cyrillic is okay).

Hope you can help,

Steve

Coordinator
Feb 17, 2011 at 6:16 AM

Hi Stephen

Can you share a small example fo file that highlights the problem?

Cheers

Mark

Feb 17, 2011 at 8:36 AM

Hi Mark,

I used the same file as above:

<?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

  <fo:layout-master-set>
    <fo:simple-page-master master-name="simple"
                  page-height="29.7cm"
                  page-width="21cm"
                  margin-top="1cm"
                  margin-bottom="2cm"
                  margin-left="2.5cm"
                  margin-right="2.5cm">
      <fo:region-body margin-top="3cm"/>
      <fo:region-before extent="3cm"/>
      <fo:region-after extent="1.5cm"/>
    </fo:simple-page-master>
  </fo:layout-master-set>

  <fo:page-sequence master-reference="simple">
    <fo:flow flow-name="xsl-region-body">
      <fo:block font-size="18pt" color="black" text-align="center" font-family="Arial">
        Hello, World! Здраво свете! 你好世界!
      </fo:block>
    </fo:flow>
  </fo:page-sequence>

</fo:root>

The Cyrillic is fine but not the Chinese - it displays in the pdf as rectangles. (I also tried splitting the languages into separate fo:blocks but this had no effect).

The C# I use is the same as I posted on March 4 last year.

Cheers

Steve

Coordinator
Feb 19, 2011 at 8:49 AM

Hi Steve

The problem is that the default Arial font does not support these Chinese characters. FO.NET is not as clever as Microsoft Word or Internet Explorer (both which seem to detect the out of range characters and automatically substitute an alternative font).

You can see this if you copy and paste the FO sample you provided into Word. The Chinese characters will take on the MS Gothic font. In fact, if you change the FO to this:

<fo:block font-size="18pt" color="black" text-align="center" font-family="Arial">
Hello, World! Здраво свете! <fo:inline font-family="MS Gothic">你好世界</fo:inline>!
</fo:block>

Then the characters get rendered OK in the PDF. Another approach it ot use the Arial Unicode MS font (http://en.wikipedia.org/wiki/Arial_Unicode_MS):

<fo:block font-size="18pt" color="black" text-align="center" font-family="Arial Unicode MS">
Hello, World! Здраво свете! 你好世界!
</fo:block>

You will want to make sure you use font subsetting rather than embedding, otherwise your PDF file sizes will become huge. I think the Arial Unicode MS file is over 20 MB in size, but subsetting will just pull out the actual characters (actually, glyphs rather than characters from a technical point of view) that you are using.

Hope this helps
Mark

Feb 21, 2011 at 1:11 PM

Hi Mark,

Once again, thanks for your help. That resolves the problem.

Steve

Apr 22, 2011 at 2:02 AM

hi

I got one problem about processing Chinese word wrap.

It's seem to not wrap properly in PDF!

<fo:block font-family="MS Gothic">中文您好中文您好中文您好中文您好中文您好中文您好中文您好中文您好中文您好中文您好中文您好中文您好中文您好中文您好中文您好中文您好</fo:block>

Ken

Coordinator
Aug 5, 2011 at 8:48 AM

Hi Ken. FO.NET cannot word wrap unless there are spaces within the text. This is an unfortunate limitation of the rendering engine.

Oct 27, 2011 at 7:41 AM

Hi there,

Im trying to create a PDF document with chinese charactes in it, using the "Arial Unicode MS" font. Im setting font-family="Arial Unicode MS" and using the following options:

PdfRendererOptions options = new PdfRendererOptions();
options.FontType = FontType.Subset;
options.Kerning = false;

Everything is wokring on my local workstation but when I try it out on my Server it is not working.

I have manually copied the "Arial Unicode MS" font to the server as I do not have Microsoft Office installed there. I can see and use the font in WordPad but it does not seem to be picked up by my application when renderig the document.

brgds

Tom

Oct 27, 2011 at 7:21 PM

Ok after long and much hairloss I found out I had to re-boot my Windows 2008 server.

Problem fixed!

/Tom

 

Aug 21, 2012 at 6:12 PM

Note that this doesnt work for Korean Characters.  I'm sure its a bug in the program

System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: index
   at System.Collections.ArrayList.get_Item(Int32 index)
   at Fonet.Pdf.Gdi.Font.IndexToLocationTable.get_Item(Int32 index)
   at Fonet.Pdf.Gdi.Font.GlyphReader.ReadGlyph(Int32 glyphIndex)
   at Fonet.Pdf.Gdi.Font.GlyfDataTable.Read(FontFileReader reader)
   at Fonet.Pdf.Gdi.Font.FontFileReader.GetTable(String tableName)
   at Fonet.Pdf.Gdi.Font.FontFileReader.GetGlyfDataTable()
   at Fonet.Pdf.Gdi.Font.FontSubset.Generate(MemoryStream output)
   at Fonet.Render.Pdf.Fonts.Type2CIDSubsetFont.get_FontData()
   at Fonet.Pdf.PdfFontCreator.CreateCIDFont(String pdfFontID, Font font, CIDFont cidFont)
   at Fonet.Pdf.PdfFontCreator.MakeFont(String pdfFontID, Font font)
   at Fonet.Render.Pdf.FontSetup.AddToResources(PdfFontCreator fontCreator, PdfResources resources)
   at Fonet.Render.Pdf.PdfRenderer.StopRenderer()
   at Fonet.StreamRenderer.StopRenderer()
   at Fonet.Fo.FOTreeBuilder.Parse(XmlReader reader)

 

Japanese, Chinese works. Korean works if it is on "Embed" but "Subset" gives me this error. I was using "Arial Unicode MS"