Welcome, Guest
Username: Password: Remember me
Visual Objects

Please use this forum to post questions about Visual Objects and Vulcan.NET
  • Page:
  • 1

TOPIC:

Reading Very Large files 16 Jun 2022 13:37 #22762

  • Plummet
  • Plummet's Avatar
  • Topic Author


  • Posts: 15
  • Hi All,
    When processing a VLF in VO, I find that number of lines output is less than number of lines input.

    I wrote some code below to read a file line by line and return the line count.

    FUNCTION Start()
    LOCAL cInputFile AS STRING
    LOCAL nInputLineCount AS DWORD
    cInputFile := K_FILE_PATH
    ? cInputFile
    IF File(cInputFile)
    ? "Will count lines ..."
    WAIT
    nInputLineCount := GetLineCount(cInputFile)
    ? "Lines=", nInputLineCount
    ELSE
    ? cInputFile, "File not found"
    ENDIF
    WAIT
    RETURN NIL

    FUNCTION GetLineCount(cFile AS STRING) AS DWORD PASCAL
    // assume file exists
    LOCAL pFile AS PTR
    LOCAL nCount AS DWORD
    pFile := FOpen(cFile, FO_READ)
    IF pFile == F_ERROR
    ? DosErrString(FError())
    ENDIF
    DO WHILE ! FEof(pFile)
    FGetS2(pFile, 1024)
    ++nCount
    ENDDO
    FClose(pFile)
    RETURN nCount

    DEFINE K_FILE_PATH := "EPD_202203.csv"


    The input file has size of 6,522,309,040 and 17,938,549 lines.
    This code run in VO gives 11,816,979 lines!
    This code run in X# gives the correct answer 17,938,549 lines.

    Can anyone pls explain why VO will not read the entire file? Is it a bug in VO runtime or a limit in the WIN32 API functions?

    You can find the actual data here if you want to test. Make sure you d/load the ZIP format!
    opendata.nhsbsa.net/dataset/english-pres...2c-8809-ac4540a962fd

    This post is linked to my other post on the Macro compiler. If I can solve one of the problems I can forget the other :)

    Don

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 16 Jun 2022 13:57 #22763

    • OhioJoe
    • OhioJoe's Avatar


  • Posts: 114
  • Two suggestions:

    1. If this code has worked before, then it's probably the input file. Try saving the file with an editor that enforces DOS line terminators: CHR(13) + CHR(10).

    2. Use FReadLine() instead of FGet(). The instructions say there's no difference but there might be.
    Joe Curran
    Ohio USA

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 16 Jun 2022 14:59 #22765

    • Plummet
    • Plummet's Avatar
    • Topic Author


  • Posts: 15
  • Thanks Joe.
    This file has standard line terminators.
    Problem only happens with VLF's, > 11m lines?
    Think I already tried FReadline ... will check anyway.
    You can check this code on any file - just change the K_FILE_PATH value to point to your data.
    Don

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 16 Jun 2022 15:37 #22766

    • robert
    • robert's Avatar


  • Posts: 3178
  • Don,

    Are there lines in the file with line length > 1024?

    Robert
    XSharp Development Team
    The Netherlands

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 16 Jun 2022 19:26 #22769

    • Plummet
    • Plummet's Avatar
    • Topic Author


  • Posts: 15
  • Thanks for your reply Robert.

    Line lengths variable (csv) but seem to be < 512, although dunno if there's a longy hidden somewhere . Difficult to look thru a file of 6gb ...

    However the same code run in X# gives the correct answer of 17,938,549 lines.
    Don

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 17 Jun 2022 08:52 #22781

    • Chris
    • Chris's Avatar


  • Posts: 3652
  • Don,

    It's very easy to check that with X#. Just use System.IO.File.ReadAllLines() in a small test app and then check the length of each line returned in the array.
    Of course you'll need to have enough memory in your system for this simple way to work! And compile in AnyCPU/x64 mode...

    .
    XSharp Development Team
    chris(at)xsharp.eu

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 17 Jun 2022 10:18 #22783

    • Plummet
    • Plummet's Avatar
    • Topic Author


  • Posts: 15
  • Thanks a lot for your reply Chris.

    Well I didn't want to read the file into memory as there is a hell of a lot of it!
    But I would like to know why, reading line by line, I was unable to get past 11m lines with VO. Is the blockage in the VO runtime or the underlying WIN32 API functions?

    Anyway, it's not really important now as Robert helped fix my macro problem, so I have successfully processed all 17m lines in X# - yess. It took about 30 mins (9 yr old pc)
    Thanks all for rapid response -
    Don

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 17 Jun 2022 12:23 #22785

    • Chris
    • Chris's Avatar


  • Posts: 3652
  • Hi Don,

    I didn't mean to do this in your real app! :) I only suggested to do it in a small 10 line test app, just to find out if your file contains large lines.
    But it's not important anymore, only maybe if you wanted to do it out of curiosity.

    .
    XSharp Development Team
    chris(at)xsharp.eu

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 17 Jun 2022 17:58 #22791

    • ArneOrtlinghaus
    • ArneOrtlinghaus's Avatar


  • Posts: 337
  • The problem in VO is perhaps related to the size of the file. More than 4 GB needs larger address pointers than a DWORD. I remember the old and famous PKZIP that had also limits with the file size.

    Arne

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 17 Jun 2022 18:26 #22793

    • Chris
    • Chris's Avatar


  • Posts: 3652
  • Hi Arne,

    In VO, you would not load the whole file in memory, instead you would read line by line. And also in X#, for such huge files in real
    conditions you would normally do the same thing, but since this was only about a very small and quick test, I suggested doing it
    this crude way, in just 10 lines of code. Doing it properly and reading line by line and avoiding using a lot of memory would instead
    need 15 lines of code :)

    .

    .
    XSharp Development Team
    chris(at)xsharp.eu

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 17 Jun 2022 18:54 #22794

    • robert
    • robert's Avatar


  • Posts: 3178
  • Chriis,
    I think Arne is right. Most likely inside FGetS2 the runtime uses a 32 bit offset in a file that is 6 Gb. That will result in an overflow with unexpected results.

    Robert
    XSharp Development Team
    The Netherlands

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 17 Jun 2022 22:13 #22796

    • FFF
    • FFF's Avatar


  • Posts: 1377
  • Hey, that was my first thought, too, but I was too timid to voice it... Thanks for confirming ;-)
    Regards
    Karl (X# 2.13.0.6; Xide 1.32; W8.1/64 German)

    Please Log in or Create an account to join the conversation.

    Reading Very Large files 18 Jun 2022 01:48 #22797

    • Chris
    • Chris's Avatar


  • Posts: 3652
  • Arne, Robert,

    Ah, I am sorry, I completely misread Arne's message! I thought it was about memory size, while it was about file pointers. Summer here is not a good time for the brain to work :)
    Of course it makes sense what you guys say, hadn't thought about this.

    .
    XSharp Development Team
    chris(at)xsharp.eu

    Please Log in or Create an account to join the conversation.

    • Page:
    • 1