By Barry Levine / CIO Today. Updated August 06, 2013.
The term "xerox" has been used for decades to indicate an identical copy. But now a researcher has discovered that, in some cases, Xerox scanners are altering numbers on documents.
Last week, a PhD candidate at the University of Bonn posted several scans on his blog showing that numbers had been changed by the tested Xerox machines. The researcher, David Kriesel, said he used a Xerox WorkCentre machine to scan building floor plan documents in order to create PDFs. The documents contained construction drawings of rooms, each indicated by a box with a name and the area in square meters.
Kriesel reported that, when he scanned the documents as TIFFs, they came out as exact replicas, but problems emerged when he used image compression on a Xerox WorkCentre 7535 and a 7556. Instead of the rooms having the original dimensions, the rooms showed some dimensions repeated erroneously.
'A Lot Worse'
In the blog posting, Kriesel described the errors as being "a lot worse" than an optical character recognition problem. Instead, he said, "patches of the pixel data are randomly replaced in a very subtle and dangerous way." The scans look correct, Kriesel said, but the numbers may be incorrect.
Kriesel said the problem was compounded by the fact that it exists on different WorkCentre models, using the current software release, and Xerox appears to have been initially unaware of the issue until his report. He said he has received e-mails from other users of the same equipment and software who have encountered similar problems.
He indicated that there appears to be a relationship between font size and scan dpi in that he was able to reproduce the error for PDF scans at 200 dpi without OCR, using Arial 7-point and 8-point font sizes.
Xerox has recently confirmed his assessment that the problem is related to how the JBIG2 image compression works on the scanner, because it looks for similar areas to compress and reuse throughout an image. Numbers in a small font are apparently being mistaken for the same information, and are thus being reused by the compression.
On Tuesday, Xerox released a statement that said the problem appears to be a combination of compression level and resolution setting. The company said that the machines used by Kriesel "are shipped from the factory with a compression level and resolution that produces scanned files which are optimized for viewing or printing while maintaining a reasonable file size," and added that the defect may be the result of using lower quality and resolution settings.
At factory default settings, Xerox said, the character substitution issue does not occur, and it recommended that users employ the factory defaults with the quality level set to "higher." It also said that there had been warnings on the copier's Web site for years that noted character substitution could happen at lower quality and higher compression settings.
The JBIG2 compression setting is only used at the lowest quality setting, which the scanner describes as "normal." Xerox has said that the default setting is "high" but Kriesel said the machines he tested had default settings of "normal," which he believes had been set by the reseller.