Programmer Guide > Working with Data Files > Input and Output of Binary Data
  

Input and Output of Binary Data
Binary I/O involves the transfer of data between a file and memory without conversion to and from a character representation. Binary I/O is used when efficiency is important and portability is not an issue; it is faster and requires less space than human-readable I/O.
 
note
Binary I/O is almost always used for the transfer of image data, such as TIFF images, or 8- and 24-bit images.
PV-WAVE provides many procedures and functions for performing binary I/O; they are listed in "Binary I/O Routines". All of these routines are described in this section except ASSOC and GET_KBRD; these important functions are discussed in "Associated Variable Input and Output" and "Getting Input from the Keyboard".
Input and Output of Image Data
Images are frequently stored using either 8-bit or 24-bit binary data. 8-bit data is capable of displaying 28 different colors, while 24-bit data is capable of displaying 224 different colors.
Images are treated in the same manner as any variable. Images may be either square or rectangular. There is no restriction placed on the size of images; the limiting factors are the maximum amount of virtual memory available to you by the operating system and the processing time required.
8-bit and 24-bit Image Data
Image data is usually stored in either an 8-bit or 24-bit format:
*8-bit Format—Images in 256 shades of gray or 256 discrete colors (sometimes known as “pseudo-color”).
*24-bit Format—3-color RGB (8 bits Red/8 bits Green/8 bits Blue) images.
8-bit images must be stored in a 2-dimensional variable, and 24-bit images must be stored in a 3-dimensional variable. For more information about how the RGB information in 24-bit image data is stored, refer to "Image Interleaving".
 
note
Your workstation or device must support 24-bit color mode if you intend to view 24-bit images with PV-WAVE.
Image Data Input
Image data can be imported using either the READU or the ASSOC commands. However, one of the easiest ways to import image data is to use either the DC_READ_8_BIT or DC_READ_24_BIT functions. For example, if the file hero.img contains a 786432 byte 24-bit image-interleaved image, the function call:
status = DC_READ_24_BIT('hero.img', hero, Org=1)
reads the file hero.img and creates a 512-by-512-by-3 image-interleaved byte array named hero.
When you do not pre-dimension the variable, PV-WAVE creates either a two- or three-dimensional byte variable, depending on whether you are using DC_READ_8_BIT or DC_READ_24_BIT. It also checks the total number of bytes in the file and automatically dimensions the import variable such that it matches the organization of the file.
To see a complete list of the image sizes that PV-WAVE checks for as it reads image data, refer to the function descriptions for DC_READ_8_BIT and DC_READ_24_BIT; you can find these descriptions in the PV‑WAVE Reference.
 
note
If you don’t want PV-WAVE guessing the dimensions of the variable, you need to explicitly dimension it.
For 8-bit image data, dimension the variable as w-by-h, where w and h are the width and height of the image in pixels. For 24-bit image data, the image variable should be dimensioned in the following manner:
*Pixel Interleaved—Dimension the import variable as 3-by-w-by-h, where w and h are the width and height of the image in pixels.
*Image Interleaved—Dimension the import variable as w-by-h-by-3, where w and h are the width and height of the image in pixels.
For a comparison of pixel interleaving and image interleaving, refer to "Image Interleaving".
 
note
One popular way of importing binary image data is with the ASSOC command. The advantages of this method are described further in "Advantages of Associated File Variables".
Image Data Output
Image data can be exported using either the WRITEU or the ASSOC commands. However, one of the easiest ways to output image data is to use either the DC_WRITE_8_BIT or DC_WRITE_24_BIT functions. For example, if fft_flow is a 600-by-800 byte array containing image data, the function call:
status = DC_WRITE_8_BIT('fft_flow1.img', fft_flow)
creates the file fft_flow1.img and uses it to store the image data contained in the variable fft_flow.
The dimensionality of the output image variable should be the same as discussed in the previous section for image data input.
 
note
One popular way of exporting binary image data is with the ASSOC command. The advantages of this method are described further in "Advantages of Associated File Variables".
TIFF Image Data
The TIFF (Tag Image File Format) is a standard format for encoding image data. Rogue Wave’s TIFF I/O follows the guidelines set forth in a Technical Memorandum, Tag Image File Format Specification, Revision 5.0 (FINAL), published jointly by Aldus™ Corporation and Microsoft® Corporation.
The two functions provided specifically for transferring TIFF images are:
DC_READ_TIFF
DC_WRITE_TIFF
These functions are easy to use. For example, if the variable maverick is a 512-by-512 byte array, the function call:
status = DC_WRITE_TIFF('mav.tif', maverick, $ 
Class='Bilevel', Compress='Pack')
creates the file mav.tif and uses it to store the image data contained in the variable maverick. The created TIFF file is compressed and conforms to the TIFF Bilevel classification.
For additional details about the DC_READ_TIFF and DC_WRITE_TIFF functions, see their descriptions in the PV‑WAVE Reference.
Compressed TIFF Files
TIFF files can be compressed if you are interested in saving disk space. Compressed TIFF files will take slightly longer to open than uncompressed TIFF files, but are a smart choice if you are willing to trade off a slightly slower access time for reduced file size.
Only TIFF class Bilevel (Class 'B') images can be compressed.
TIFF Conformance Levels
When using DC_READ_TIFF and DC_WRITE_TIFF, you are able to select the class (level of TIFF conformance) that you wish to follow. The four conformance levels are:
*Bilevel—All pixels are either black or white; no shades of gray are supported.
*Grayscale—Each pixel is described by eight bits (a byte). With eight bits, 28 shades of gray can be represented.
*Palette Color—Each pixel is described by eight bits (a byte), so 28 discrete colors can be represented. During output, you must supply a colortable that can be stored with the image; you do this using the Palette keyword.
*RGB Full Color—Each pixel is described by 24 bits (1 byte red, 1 byte green, and 1 byte blue). With 24 bits, 224 full RGB colors can be represented.
If Palette Color is selected, you must supply (using the Palette keyword) a 3-by-256 array of integers that describes the colortable to be used by the TIFF image.
If RGB Full Color is selected, the export variable must be a w-by-h-by-3 byte image interleaved array. (The letters w and h denote the width and height of the image, respectively.) Pixel interleaved 24-bit data cannot be exported to a TIFF file. The details of pixel interleaving and image interleaving are described in the next section.
Image Interleaving
Interleaving is the method used to organize the bytes of red, green, and blue image data in a 24-bit image. In other words, each of the basic colors requires 1 byte (8 bits) of storage for each pixel on the screen; the question is whether to store the color data as RGB triplets, or to group all the red bytes together, all the green bytes together, and all the blue bytes together. The two options are shown in Interleaving Options:
 
Table 9-13: Interleaving Options
Pixel Interleaving
Image Interleaving
RGBRGBRGBRGB
RRRRRRRRRRRR
RGBRGBRGBRGB
GGGGGGGGGGGG
RGBRGBRGBRGB
BBBBBBBBBBBB
For more information about how the image variable should be dimensioned to match the various interleaving methods, refer to "Image Data Input".
READU and WRITEU
READU and WRITEU provide basic binary (unformatted) input and output capabilities.WRITEU writes the contents of its variable list directly to the file, and READU reads exactly the number of bytes required by the size of its parameters. Both procedures transfer binary data directly, with no interpretation or formatting.
The general form for using either READU or WRITEU is:
READU, unit, var1,...,varn
WRITEU, unit, var1,...,varn
where var1 represents one or more variables (or expressions in the case of output).
Transferring Data with READU and WRITEU
The following examples demonstrate how the transfer data using READU and WRITEU.
Example 1—C Program Writes, PV-WAVE Reads
The following C program produces a file containing employee records. Each record stores the first name of the employee, the number of years they have been employed, and their salary history for the last 12 months.
C Program Write
#include <stdio.h>
main()
{
 static struct rec {
char name [16];   /* Employee's name */
int years;        /* Years with company*/
float salary[12]; /* Salary for last */
                  /* 12 months */
} employees[] = {
{"Bullwinkle", 10,
{1000.0, 9000.97, 1100.0, 0.0, 0.0, 2000.0, 5000.0, 3000.0, 1000.12, 3500.0, 6000.0, 900.0} },
{"Boris", 11,
{400.0, 500.0, 1300.10, 350.0, 745.0, 3000.0, 200.0, 100.0, 100.0, 50.0, 60.0, 0.25} },
{"Natasha", 10,
{950.0, 1050.0, 1350.0, 410.0, 797.0, 200.36, 2600.0, 2000.0, 1500.0, 2000.0, 1000.0, 400.0} },
{"Rocky", 11,
{1000.0, 9000.0, 1100.0, 0.0, 0.0, 2000.37, 5000.0, 3000.0, 1000.01, 3500.0, 6000.0, 900.12} }
};
FILE *outfile;
outfile = fopen("bullwinkle.dat", "w");
(void) fwrite(employees, sizeof(employees), 1, outfile);
(void) fclose(outfile);
	}
Running this program creates the file bullwinkle.dat containing the employee records.
PV-WAVE Read
The following PV-WAVE statements can be used to read the data in bullwinkle.dat:
; Create a string with 16 characters so that the proper number
; of characters will be input from the file. REPLICATE is used to 
; create a byte array of 16 elements, each containing the ASCII 
; code for a space (32). STRING turns this byte array into a
; string containing 16 blanks.
str16 = STRING(REPLICATE(32b,16))
; Create a structure of four employee records to receive the
; input data.
A = REPLICATE({employees, name:str16, $ 
years:0L, salary:fltarr(12)}, 4)
; Open the file for input.
OPENR, 1, 'bullwinkle.dat'
; Read the data.
READU, 1, A
; Close the file.
CLOSE, 1
For other examples of how to read bullwinkle.dat with PV-WAVE, refer to "Reading, Sorting, and Printing Tables of Formatted Data".
Example 2—PV-WAVE Writes, C Program Reads
The following programs demonstrate how PV-WAVE can produce a file containing an array of floating point values, and the C program reads that file and displays the values.
PV-WAVE Write
The following PV-WAVE program creates a binary data file containing a 5-by-5 array of floating-point values:
; Open a file for output.
OPENW, 1, 'float.dat'
; Write a 5-by-5 array with each element set equal to its 
; one-dimensional index.
WRITEU, 1, FINDGEN(5, 5)
; Close the file.
CLOSE, 1
C Program Read
The file float.dat can be read and printed by the following C program:
#include <stdio.h>
main()
{
float data[5][5];
FILE *infile;
int i, j;
infile = fopen("float.dat", "r");
(void) fread(data, sizeof(data), 1, infile);
(void) fclose(infile);
for (i = 0; i < 5; i++)
{
   for (j = 0; j < 5; j++)    printf("%8.1f", data[i][j]);
   printf("\n");
}
}
Running this program results in the following output:
    0.0     1.0    2.0    3.0    4.0
    5.0     6.0    7.0    8.0    9.0
   10.0    11.0   12.0   13.0   14.0
   15.0    16.0   17.0   18.0   19.0
   20.0    21.0   22.0   23.0   24.0
Binary Transfer of String Variables
The only basic data type that does not have a fixed size is the string data type. A string variable has a dynamic length that is dependent only on the length of the string currently assigned to it. Thus, although it is always possible to know the length of the other types, string variables are a special case. PV-WAVE uses the following rules to determine the number of characters to transfer:
*Input—Input enough bytes to fill the currently defined length of the string variable.
*Output—Output the number of bytes contained in the string. This number is the same number that would be returned by the STRLEN function. In other words, the output string contains only the characters in the string and does not include a terminating null byte.
These rules imply that when reading into a string variable from a file, you must usually know the length of the original string so as to be able to initialize the destination string to the correct length. The following example demonstrates the problem and shows how to use the STRLEN function to programmatically initialize the string length.
Examples of Binary String Data Transfer
For example, the following statements:
; Open a file.
OPENW, 1, 'temp.txt'
; Write an 11-character string.
WRITEU, 1, 'Hello World'
; Rewind the file.
POINT_LUN, 1, 0
; Prepare a 9-character string.
A = '         '
; Read the string in again.
READU, 1, A
; Show what was input.
PRINT, A
CLOSE, 1
produces the following output because the receiving variable A was not long enough:
Hello Wor
The only solution to this problem is to know the length of the string being input. One way to do this is to store the length of the string(s) in the file at the time the file is created. The following statements demonstrate a technique for doing this:
; Define a string variable that contains the desired string.
hello = 'Hello World'
; Initialize an integer variable, and then use it to store the
; length of the string variable.
len = 0
len = STRLEN(hello)
; Open a file.
OPENW, 1, 'temp.txt'
; Write the string length to the file.
WRITEU, 1, len
; Now write the string to the file.
WRITEU, 1, hello
Now that the string length (an integer), followed by the string, have been stored in the file, prepare to read the string back into PV-WAVE:
; Initialize an integer variable, and then use it to read the
; string length.
len_input = 0
READU, 1, len_input
; Create a string of the desired length, initialized with blanks. 
; The result of the call to REPLICATE is a byte array with the 
; necessary number of elements, each element initialized to 32, 
; which is the ASCII code for a blank. When this byte array is 
; passed to STRING, it is converted to a scalar string containing 
; this number of blanks.
A = STRING(REPLICATE(32b, len_input))
; Read the string.
READU, 1, A
; Show what was input.
PRINT, A
CLOSE, 1
produces the following output:
Hello World
This example takes advantage of the special way in which the BYTE and STRING functions convert between byte arrays and strings. See the descriptions of the BYTE and STRING functions for additional details. These descriptions are alphabetically arranged in the PV‑WAVE Reference.
Reading UNIX FORTRAN-Generated Binary Data
Although the UNIX operating system considers all files to be an uninterpreted stream of bytes, FORTRAN considers all I/O to be done in terms of logical records. In order to reconcile the FORTRAN need for logical records with the UNIX operating system, UNIX FORTRAN programs add a longword count before and after each logical record of data. These longwords contain an integer count giving the number of bytes in that record.
The use of the F77_Unformatted keyword with the OPENR statement informs PV-WAVE that the file contains binary data produced by a UNIX FORTRAN program. When a file is opened with this keyword, PV-WAVE interprets the longword counts properly, and is able to read and write files that are compatible with FORTRAN.
Example—UNIX FORTRAN Program Writes, PV-WAVE Reads
The following UNIX FORTRAN program produces a file containing a 5-by-5 array of floating-point values, with each element set to its one-dimensional subscript. It is thus a FORTRAN implementation of the FINDGEN function for the special case of a 5-by-5 array.
FORTRAN Write
INTEGER I, J
REAL DATA(5, 5)
OPEN(1, STATUS = "new", FILE = "mydata", 
FORM = "unformatted")
DO 100 J = 1, 5
DO 100 I = 1, 5
DATA(I,J) = ((J-1) * 5) + (I-1)
100	 CONTINUE
WRITE(1) DATA
END
Running this program creates a file mydata that contains the array of numbers.
PV-WAVE Read (Method 1)
The following PV-WAVE statements can be used to read this file and print its contents:
; Open the file. The F77_Unformatted keyword lets PV-WAVE know
; that the file contains binary data produced by a UNIX FORTRAN
; program.
OPENR, 1, 'mydata', /F77_Unformatted
; Create an array to hold the data. The command executes faster
; because the Nozero keyword disables the automatic zeroing of
; each value that normally occurs.
A = FLTARR(5, 5, /Nozero)
; Read the data in a single input operation.
READU, 1, A
; Print the result.
PRINT, A
; Close the file.
CLOSE, 1
Executing these PV-WAVE statements results in the following output:
 0.0000      1.0000     2.0000     3.0000     4.0000
 5.0000      6.0000     7.0000     8.0000     9.0000
10.0000     11.0000    12.0000    13.0000    14.0000
15.0000     16.0000    17.0000    18.0000    19.0000
20.0000     21.0000    22.0000    23.0000    24.0000
PV-WAVE Read (Method 2)
Because binary data produced by UNIX FORTRAN programs are interspersed with these “extra” longword record markers, it is important that the PV-WAVE program read the data in the same way that the FORTRAN program wrote it. For example, consider the following attempt to read the above data file one row at a time:
; Open the file. The F77_Unformatted keyword lets PV-WAVE know
; that the file contains binary data produced by a UNIX FORTRAN
; program.
OPENR, 1, 'mydata', /F77_Unformatted
; Create an array to hold one row of the array.
A = FLTARR(5, /Nozero)
; One row at a time.
FOR I = 0, 4 DO BEGIN
; Read a row of data.
READU, 1, A
; Print the row.
PRINT, A
ENDFOR
; Close the file.
CLOSE, 1
Executing these PV-WAVE statements produces the output:
0.00000 1.00000  2.00000  3.00000  4.00000
%End of file encountered. Unit: 1.
File: mydata
%Execution halted at $MAIN$   (READU).
This program read the single logical record written by the FORTRAN program as if it were written in five separate records. Consequently, it reached the end of the file after reading the first five values of the first record.
Portability Issues with Binary Files
64-bit versus 32-bit
C long integer data types occupy 8-bytes on 64-bit systems and 4-bytes on 32-bit systems. If you write out PV‑WAVE LONGs on a 32-bit system and want to read them back in on a 64-bit system, you need to either read the data into the INT32 data type, which is 4-bytes on all platforms, or use the READU_32 routine. Binary files containing PV‑WAVE LONGs written out on a 64-bit machine may not be able to be read on a 32-bit system if the LONG values in the file are too large to fit into 32-bit LONGs. If they can fit, you have to read them in pieces (an 8-element BYTE array, for example) and discard the appropriate 4 bytes. Which 4 bytes are the appropriate ones depends upon the endianness of the machine and is addressed in the next topic.
Endianness
Endianness, or byte order, refers to the internal representation of data on a particular machine and the order in which the bytes representing multi-byte data values are stored and read. The two types are big-endian and little-endian and which representation your machine uses depends on your hardware. It is important to know that your machine writes binary data to a file in the order in which it interprets it and a machine of a different type may read and interpret that data incorrectly. Unfortunately, there is no way to programatically determine the byte order of a binary file, so you either have to know on what hardware the file was written or make sure that all binary data is written out in 'network order'. Network order is big-endian and the /HtoNL (Hardware to Network Long) keyword to the BYTEORDER routine ensures your data is stored in this order, regardless of your current hardware. Convert all multi-byte, numerical data values in this manner prior to storage and convert them back with the /NtoHL keyword to BYTEORDER when read in.
 
note
Binary dumps of structures containing LONG values on either a 32-bit or 64-bit system are not be retrievable as a single unit by the other type of platform. You have to either write out the structure elements individually, read them back in individually, and use the values to populate the structure, or selectively read the bytes representing the structure. The first method effectively removes all of the hardware-specific padding between structure elements and is the preferred method. The second method requires a thorough understanding of how the structure elements were arranged in the original machine's memory. In either case, you are not able to read the structure from the file as a single unit.

Version 2017.1
Copyright © 2019, Rogue Wave Software, Inc. All Rights Reserved.