Welcome, Guest
Username: Password: Remember me
Visual Objects

Please use this forum to post questions about Visual Objects and Vulcan.NET
  • Page:
  • 1

TOPIC:

Xml file UTF-8 BOM 19 Apr 2021 20:02 #18082

  • diobrando
  • diobrando's Avatar
  • Topic Author


  • Posts: 35
  • Hi All,
    I'm having truble with some electronic invoice, in XML format, that are encoded in UTF-8 BOM (Byte Order Mark)
    I use Fopen() to open the xml file, and FReadLine() to read every line step by step.

    With some of these files I'm getting some strange characters at the beginning, and I discovered that are encoded with BOM
    "<?xml version"

    There is any method to remove the BOM encoding with VO?

    Thanks

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 19 Apr 2021 20:22 #18083

    • Sherlock
    • Sherlock's Avatar


  • Posts: 51
  • www.w3.org/International/questions/qa-byte-order-mark

    What is do ,, is if that string found [  ] reduce to []
    I have XML that does not have it, but my editor/hex editor adds it.
    My XML code reader could not detect the <?xml version in the file.
    You could detect ether as valid. "<?xml version" or "<?xml version"

    The hexadecimal byte values in the file, the UTF-8 signature displays as EF BB BF
    Phil McGuinness

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 19 Apr 2021 20:39 #18084

    • robert
    • robert's Avatar


  • Posts: 3276
  • Stefano,

    What you probably should do is:
    - skip the BOM when it exists
    - when the file has a BOM then then use the function Utf82Ansi() to translate the strings that you read from the file from UTF8 to Ansi. (This function is in the Util module inside System Library).

    Robert

    Robert
    XSharp Development Team
    The Netherlands

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 19 Apr 2021 21:22 #18085

    • Chris
    • Chris's Avatar


  • Posts: 3739
  • I would just use .Net methods for file access, since those are a lot more powerful and can automatically handle BOM markers, encodings etc:

    USING System.IO
    ...
    LOCAL oStream AS StreamReader
    LOCAL cLine AS STRING
    oStream := StreamReader{"c:\test\testutf.txt", TRUE} // automatically detect encoding
    DO WHILE oStream:Peek() != -1
    	cLine := oStream:ReadLine()
    	? cLine
    END DO

    Or even simpler:

    System.IO.File.ReadAllLines() // returns an array of strings


    Edit: Oops, sorry, did not realize this is about VO!
    XSharp Development Team
    chris(at)xsharp.eu

    Please Log in or Create an account to join the conversation.

    Last edit: by Chris.

    Xml file UTF-8 BOM 19 Apr 2021 22:09 #18087

    • ic2
    • ic2's Avatar


  • Posts: 1573
  • Hello Stefano,

    Are you using VO or X#?

    We read (and create) UBL files in VO and it works fine so far. But we read the XML string using this function and probably that is what could help for you as well.


    Dick

    FUNCTION StringReadZeroNoAnsi(cPath AS STRING) AS STRING PASCAL
    //#s KB 24-1-2011
    //#s Alternative for MemoRead that is not SetAnsi dependant
    LOCAL cText AS STRING
    LOCAL ptrHandle AS PTR
    LOCAL dwFileSize AS DWORD
    LOCAL dwError AS DWORD

    cText := ""
    dwError := 0

    IF FFirst(String2Psz(cPath), FC_NORMAL)
    dwFileSize := FSize()
    IF dwFileSize > 0
    cText := Buffer(dwFileSize)
    ptrHandle := FOpen2(cPath, FO_READ + FO_SHARED)
    dwError := FError()
    IF dwError == 0
    IF FRead(ptrHandle, @cText, dwFileSize) == dwFileSize
    FClose(ptrHandle)
    ENDIF
    dwError := FError()
    ENDIF
    ENDIF
    ELSE
    dwError := FError()
    ENDIF

    IF dwError <> 0
    // Error handling
    ENDIF

    RETURN cText

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 20 Apr 2021 06:06 #18088

    • wriedmann
    • wriedmann's Avatar


  • Posts: 3244
  • Ciao Stefano,
    since the .NET XML functions are much more powerful, I have implemented the reading of the FPA/FPR invoices in X# and I'm using them in through a COM module in VO.
    That also helps removing the eventual present signature in case of a p7m file.
    If you are interested, I can give you a part of the code (my complete code includes also the sending and receiving code to the web service of my provider).
    Wolfgang
    Wolfgang Riedmann
    Meran, South Tyrol, Italy

    www.riedmann.it - docs.xsharp.it

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 20 Apr 2021 06:16 #18089

    • diobrando
    • diobrando's Avatar
    • Topic Author


  • Posts: 35
  • Thanks all for the suggestions,
    I'll try that

    With FReadLine() I'm also getting a string lenght of 256 byte with no CRLF.
    Tried to open the xml file with Notepad++ and I see, in the status bar, Unix (LF) UTF-8 BOM
    This file is generated from a web based software for electronic invoice, in this case Aruba fatturazione elettronica.

    With other xml files says Windows (CRLF) + UTF-8. This file gives me no problem

    How can I workaround this with VO?

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 20 Apr 2021 06:24 #18090

    • diobrando
    • diobrando's Avatar
    • Topic Author


  • Posts: 35
  • wriedmann wrote: Ciao Stefano,
    since the .NET XML functions are much more powerful, I have implemented the reading of the FPA/FPR invoices in X# and I'm using them in through a COM module in VO.
    That also helps removing the eventual present signature in case of a p7m file.
    If you are interested, I can give you a part of the code (my complete code includes also the sending and receiving code to the web service of my provider).
    Wolfgang


    Hi Wolfgang,
    thanks for the support.

    I would be interested in trying what you have done.
    Actually I'm un-singning the .p7m files with openssl command using ShellExecute() and it is working fine.
    Are you using a scraping tecnhique to send/receive files through web service?

    Thanks
    Stefano

    Please Log in or Create an account to join the conversation.

    Last edit: by diobrando.

    Xml file UTF-8 BOM 20 Apr 2021 06:25 #18091

    • wriedmann
    • wriedmann's Avatar


  • Posts: 3244
  • Ciao Stefano,
    you need to read the file entirely and then use MemoLine() to split the lines, and maybe split the lines using StrTran() replacing all LF with CRLF.
    But please beware that received files may have several different formats: maybe even the entire data without any line break - I have seen a lot of different things now. Your read function should not depend on any newline.
    Wolfgang
    Wolfgang Riedmann
    Meran, South Tyrol, Italy

    www.riedmann.it - docs.xsharp.it

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 20 Apr 2021 06:29 #18092

    • wriedmann
    • wriedmann's Avatar


  • Posts: 3244
  • Ciao Stefano,
    (for others: in Italy all the invoices need to be sent in a specific XML format through a system maintained by the ministry of the Finance):
    to remove the signature I'm using a simple .NET call.
    For sending and receiving the invoices I'm using an API that my provider has. AFAIK also Aruba has a sort of API, and it is much, much simpler do that in .NET than in plain VO.
    Therefore I have all that functionality in a X# module that is used through COM in my VO applications.
    Wolfgang
    Wolfgang Riedmann
    Meran, South Tyrol, Italy

    www.riedmann.it - docs.xsharp.it

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 20 Apr 2021 08:26 #18093

    • ArneOrtlinghaus
    • ArneOrtlinghaus's Avatar


  • Posts: 337
  • We use the attached function FGetFileEncoding to look for the format of the file.
    It returns two parameters:
    - The encoding (ANSI, UTF-8, Unicode)
    - The characters to omit (the BOF-markers)

    For reading lines with automatically treating correctly CRLF or CR or LF we use the attached object clsfilebuffered input. (Probably some functions are missing. If someone is interested, please write me)

    File Attachment:

    File Name: ffileencoding.txt
    File Size:32 KB

    File Attachment:

    File Name: clsfile.txt
    File Size:51 KB

    Arne
    Attachments:

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 20 Apr 2021 10:36 #18099

    • ic2
    • ic2's Avatar


  • Posts: 1573
  • Hello Wolfgang,

    wriedmann wrote: it is much, much simpler do that in .NET than in plain VO..... I'm using them in through a COM module in VO


    I basically do the same for multiple methods which are easier to implement in .Net than in VO. But I didn't find anything easier concerning XML in .Net. Reading UBL files, the proposed European standard for invoices, is just 2 functions in VO and then code like this, total 14 lines to read in all relevant data.

    SELF:cXml:=StringReadZeroNoAnsi(cFile) // Read XML
    SELF:cInvoiceDateXML:=SeekXMLElement("cbc:IssueDate",cXml,1) // etc

    I do not see how that could be done easier in .Net.

    Dick

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 20 Apr 2021 10:49 #18100

    • wriedmann
    • wriedmann's Avatar


  • Posts: 3244
  • Hi Dick,
    the implementation of the XML functions seemed me easier in .NET than in VO.
    And since I already need a .NET module for the webservice interaction, I have simply added my COM module.
    Wolfgang
    Wolfgang Riedmann
    Meran, South Tyrol, Italy

    www.riedmann.it - docs.xsharp.it

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 20 Apr 2021 13:35 #18101

    • diobrando
    • diobrando's Avatar
    • Topic Author


  • Posts: 35
  • Hi all,
    I'm testing now ChilKat ActiveX COM component and so far so good.

    I need to manipulate extracted strings from the xml file node by node (I don't need all of them) as this have to be processed later for being inserted in a temp DB.

    I also need to extract the xml data from digitally signed files (.p7m) and, some time, to extract the PDF attached to the xml file for archiving purpose.

    Stefano

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 20 Apr 2021 14:55 #18102

    • Sherlock
    • Sherlock's Avatar


  • Posts: 51
  • Not tested and some wanted a simple VO syntax.. I remember this worked in some memo coversions.
    Cannot remember it TRUE then FALSE or FALSE then true. Worth a shot

    SetAnsi(TRUE)
    cData := memoread( cData)
    SetAnsi(FALSE)
    Phil McGuinness

    Please Log in or Create an account to join the conversation.

    Xml file UTF-8 BOM 20 Apr 2021 15:59 #18103

    • ic2
    • ic2's Avatar


  • Posts: 1573
  • Hello Phil,

    Sherlock wrote: Not tested and some wanted a simple VO syntax.. I remember this worked in some memo conversions.


    TRUE specifies the ANSI format; FALSE specifies the OEM format.

    So it's still a conversion. If something is converted incorrectly with ANSI then OEM may solve that but it also may not. The function I published few messages above does not have this issue.

    Dick

    Please Log in or Create an account to join the conversation.

    • Page:
    • 1