News:

:) To see the latest forum posts scroll to the bottom of the main page and click on the View the most recent posts on the forum link

Main Menu

reading CAMECA files without software

Started by Ben Buse, October 08, 2025, 01:09:39 AM

Previous topic - Next topic

Ben Buse

Hi

Does anyone have a script that will read CAMECA files and export to more useful format, the wavescan files wdsdat and the quant data files.

It seems it will be important going forward as more CAMECA instruments end, being able to access old data, without maintaining old computers.

Ben

sem-geologist

#1
TLDR;
script for export? no
library for reading? yes (it depends, you can read things, but will need to figure out how to export into form which you want)

Their formats are binary where structure is dynamic. A Script - that would be absolutely too simple for such extremely difficult task. I had reverse engineered formats to some extent and defined them using KaitaiStruct: website of the RE tool kaitai.io;
The repository for the parser description in kataistruct code is here:
https://github.com/sem-geologist/peaksight-binary-parser

Kaitai struct code can be used to inspect single files in kaitaistruct webide (kind like very advanced hexeditor), but is inpractical on its own. Kaitai struct can be translated into many different programming languages, making a parser library.

To look to some practical usage of such parser made for python, look here:
https://github.com/sem-geologist/HussariX

BTW, what do you mean the "end"? These instruments are one of the most repairable and upgradable machines on the market.

Ben Buse

#2
Thanks I'll take a look at that. I've just been modifying  https://stackoverflow.com/questions/31410043/hiding-lines-after-showing-a-pyplot-figure to plot a folder of exported wavescans (which I'd exported from cameca software), allowing selection of lines from legend,

sem-geologist

Quote from: Ben Buse on October 08, 2025, 07:46:58 AMThanks I'll take a look at that. I've just been modifying  https://stackoverflow.com/questions/31410043/hiding-lines-after-showing-a-pyplot-figure to plot a folder of exported wavescans (which I'd exported from cameca software), allowing selection of lines from legend,

I think I get what you try to achieve, and I think HussariX would let you do exactly that straight from the binary files without any export. Let me know, or PM me if you need some more help with setting it up (it, unfortunately, has still no installer, but as You are already familiar with python scripts, setting up the libraries should be not very difficult for you. Every time I get some spare time and can sit down to continue my software work, yet another hardware problem on one of two EPMA or its peripherals appears taking away all my time).

Ben Buse

Ok, I've done the easy part followed your instructions, read the CAMECA wavescan file, and printed the comments for each sample, which works great, now for the hard part, work out how to extract the x and y values, any ideas

dts = parsed_data.content.datasets
# if we want to print list of datasets with its setup name and comments:
for i in dts:
  print(i.header.setup_file_name.text, i.comment.text)
wavescan over beryllium4.wdsSet Beryl sk-2
wavescan over beryllium4.wdsSet Beryl sk-1
wavescan over beryllium4.wdsSet Chrysoberyl
wavescan over beryllium4.wdsSet b7 al2o3
wavescan over beryllium4.wdsSet cstd1 Quartz
wavescan over beryllium4.wdsSet Phenakite
wavescan over beryllium4.wdsSet herderite
wavescan over beryllium4.wdsSet b7 durango apatite
>>> i
<cameca.Cameca.Dataset object at 0x000001C8A6C757E0>
>>> i.__dict__

>>> i.__dict__.keys()
dict_keys(['_io', '_parent', '_root', 'header', 'items', 'comment', 'reserved_0', 'n_extra_wds_stuff', 'extra_wds_stuff', 'has_overview_image', 'polygon_selection', 'overview_image_dataset', 'polygon_selection_type', 'is_video_capture_mode', 'reserved_1', 'reserved_v17', 'image_frames', 'reserved_2', 'overscan_x', 'overscan_y', 'dts_extras_type', 'extras'])

Ben Buse

#5
And here opening up a quant point file

I've found can extract list of elements, but yet to find weight percent values

#prints atomic number of elements
for x in range (0,len(i.items)):
    print(i.items[x].signal_header.element.atomic_number)

This also contains
i.items[1].signal_header.__dict__.keys()
dict_keys(['_io', '_parent', '_root', 'element', 'xray_line', 'order', 'spect_no', '_raw_xtal', 'xtal', 'two_d', 'k', 'reserved_0', 'hv', 'beam_current', 'peak_pos', 'counter_setting'])

So parameters can be extracted as
for x in range (0,len(i.items)):
    print("z:", i.items[x].signal_header.element.atomic_number, "line:", i.items[x].signal_header.xray_line, "order:", i.items[x].signal_header.orde

Interesting FILL is LLiF and TEPL is LPet

Probeman

Quote from: Ben Buse on October 14, 2025, 06:20:12 AMInteresting FILL is LLiF and TEPL is LPet

Maybe because the byte order is reversed on Motorola devices compared to Intel?  Big-endian vs. Little-endian:

https://en.wikipedia.org/wiki/Endianness
The only stupid question is the one not asked!

sem-geologist

#7
Reverse engineering of technology is kind like archeology or geology, where we see some final result and we try to interpret how, and why it is we see it like it is, and not the other way which we would expect.
And so indeed I also thought what was with this reverse Xtal naming convention inside binary files. I see few usage cases which simplifies identification. To add some detail here we need to consider that modern Cameca Peaksight works on Windows on x86 architecture (even if installed on X86_64 , the 64bit OS) and use little-endian for persistent data storage on the disk, same as OS and architecture. Why does it matter? well Xtal abbreviation is stored a bit differently to other textual strings in these files – it is stored as fixed size of 4 bytes (so 4 letters). XTAL string could also be cast into uint32 integer and used as kind of Enum. Moreover, because it is written in reverse it is easy to check xtal family by checking 3 bytes from 4, or with bit mask. Compare these `LPET` and `PET\x00`: index-wise they are absolutely different textual strings. Now if we reverse `TEPL` and `TEP\x00` - tada - first 3 bytes are the same. And in case if we would deal with it as cast into integer identificator, simple ultra fast bit shift `>> 8` can be used for family-wise filtering. It is more elegant that way. And so it is left as that, as it is easier to use it for data sorting according to XTAL family if needed.

Now for data access - It's very "easy", it sits down in some deep data structure. So if your dataset is as variable `i` you can access it:
i.items[index of WDS in dataset].signalthe signal will have further nodes depending from the type. For wds scan data you can get raw y at .signal.data.bytes - which is raw bytes. (Hopefully You had read the introduction at my kaitai code about limitations of Kaitai, and what is recommended to do at target language).
There is an easier and more pleasant way. You could copy wrapper for python from HussariX project, it overrides some of kaitaistruct types and sanitizes them for easier use with python.
In particularly this file: https://github.com/sem-geologist/HussariX/blob/master/lib/parsers/cameca.py
It won't work, unless getting rid of `Element` notation which it uses from other file, or simply copy whole class of Element into that file (The Element class is very small class to use interchangably element abbreviations and atomic numbers, and check by one or the other which of element it is; I know there are some bloated libs including that functionality, but I used in HussariX something minimal, and with a better layout for performance). With that wrapper a lot will get easier for WDS, 3 different x and y scales can be made (in nm, in keV, or 100ksintheta; cts, cts/s, cts/s/nA); and most important of them not as bytes, but as performant numpy arrays. Also all textual values no more need to use .text, but can be seen and used directly. It also reads the images from impDat. And do that lazily (wont dump all huge content into memory, but only get the data which is asked to be opened.

Furthermore, I think You insist to use pyplot. I was trying initially to do the same, but abandoned matplotlib for that purpose completely as it really performs terrible for interactive work. It is excellent for publishing plots, but for interaction (adding, removing lines, rescaling, highlighting, changing perspective from lin to log and etc etc.) it really is slow and painful. Thats why HussariX use pyqtraph. It is also much easier to interface with GUI as it is made in Qt and use it for very fast plotting.

As for wt% You will find it also in .signal branch which for qtiDat will look quite different. Unless there is some wt% in wdsDat - which is then bad luck, as I never used such Peaksight functionality and had not took it into my RE attempt.

Ben Buse

#8
Thanks for your reply

So I took your linked file, copied element into it, and changed
from .cameca_ks.cameca import Cameca, KaitaiStream
to
from cameca import Cameca, KaitaiStream
Renamed as cameca_read3

Then ran

import cameca_read3
>>> cameca_read3.WdsScanSignal(i.items[0].signal.data.bytes)


with result

Process ended with exit code 3221225725.
The key is how to decode the bytes...
b'\xc3\xc8\x8fB\x02\x88\x89BB\xe9\x92B4\n\x9bBI)\x94B8\x07\x82B}J\x9cB\xde\x8b\xa7BVl\xa9BT\x10\xc3B\x19\xb4\xd7B\xd6\x12\xd2B#\x95\xdeB\x00\xf4\xd8B\xe6\x8e\x04C\xa4\xbb\xffB\x81{\xfeB[\xfb\xfbB\xc1\x10\x0cC\xed\xf9\xf6B&\x9f\x03C\xee\x00\rCv^\x02C\x99p\x0bC \xcf\x05C%\xe0\tC\xa1\xc0\x0bC4\xa2\x12CD\xe1\x0eC~\x92\x13C\xf4P\rC+\xd4\x19C\xd8\xe3\x18C\x8b\x18*CH\xb7%C\xa2\x9a1CI\n0C\x06\xdb2CaO@C&\xff?C\xe6\xd3LC\xff\t^CB\x9cdC\xadu{C\x13\x16\x85CB\xe4\xa1CdA\x9cC\xd8O\xa8C\xe6\x9c\xb0CA\xde\xbeCU&\xd7C\xde\xf1\xf9CQ\xe8\x11D\xd9\xc5"D\n\x0c<D\xe6HZD\x88\\sD\xe5\x9d\x90D\x1e\xb0\xb3D_\x08\xd5D\xb7i\xf1D\x99X\x02E\xb0\xf7\xfdD\xecj\xf1D\xe8O\xf3D\xa2,\xf1Df\xf2\xfeD\n(\x03Ep\xc8\xffD\xdf/\x06E\x18\xf9\x0bE\x12\xa5\nE\x99j\x07E\x93\x02\x00E\xfd%\xffDlH\xefD\x85\xf2\xd8DV\xc5\xbeD\xd5\xc6\x9fD\x848\x84DV\x86lD\x02\x0fOD\x89\xd38DOz\x1fD\xd8a\x0eDH\xb6\xfdC[\x92\xe5Cc:\xb9CiZ\x9dChd\x92C\xe4olC\x98\xe1FC\xa4x)C\xcc>\x04C:\xd8\xeeB\x05\x13\xd2B\rr\xccB\xe8.\xb7B\x1ap\xc2B;\xcc\xa8Bvk\xa4B\x9b-\xb2Bj\xad\xafBvk\xa4BR)\x94B\x16K\xa1B\xae\xa9\x96B\xf6K\xa6B\x17\xca\x99B\xcaI\x97Bi\xcb\xa3B\xb4,\xadB\xdch\x90B\x14*\x99B#\xcc\xa8B\x18\xca\x99B\xf5)\x99B\xf5h\x90B\xef)\x99BvH\x8dBJ)\x94B\xe3\x89\x98B(\x07\x82B\xfeG\x88B\xdeLvBB\xa7\x82B\x1e\xca\x99B4\x89\x93B\xf6\x08\x91B\x87\x0bfB\xb0\xa9\x96Bl\x87\x84B\xb6\x07\x87B\x0b\xc7\x80B\xe0g\x86B\xc8\xccsB\xe4h\x90B\xfc\xe7\x88Bu\x08\x8cB\x98\x87\x84B\x0eLlB4INB"\xe8\x88B\xa9\xc7\x85B2\x07\x82B\xbc\x0cuB\xc9\xa7\x87B\xb7\xccsBC\x07\x82BGh\x8bB\xb1\xa9\x96B)\xcdxB\xfb\x8bmB(\xcaZBX\xa7\x82B\xba\xc9UB\xb0\xcd}Bh\x8dwB\xb6\x8bhB\xf4&\x80B\xbeKgBH\xa7\x82B\x06\rpB\x92\xcbiB\x07\'\x80BY\tRBO\rzBP\xa7\x82BT\rzBn\x8crB^J]B\xf4\naB\xb8\xccsB]\x89OB\x92\xccsB\xce\naB\xde\x89TB\xa3J]B6\xc9PB\x8e\tRB\xb2\xcbiB\x9e\x89TB\xafISB\xe4\x88JBz\xc8FB]\x0bfB\xc3\xccsB\xaf\xca_B+MvB\x08KbB\xa8KgB\xf7\x08MB\x8a\x8a^B\x06KbBK\x88EB\xf6IXB\x0eMvB\xd6\xc9UBuJ]B`\n\\B\x8d\x89TBw\x88EB\xa3\x89TB\xce\x89TBG\xcbdBWKgBXM{B\x19\x8bcBD\x0cpB\xa8HIBkKgB\xcc\x0bkB}\xc9PB\n\tMB7\x0e\x7fBcLqB\xd7KlBN\x08CB\xaeHIB\x0e\x8cmB \x08CB\x1b\xc7\x80BF\xc9PBcJ]BbJ]B\xb4\x88JB\xa4\xcbiB\xd8JbBB\x07\x82BoM{B\xaf\x0bkB\xb0\x0bkB\xc0\naBuG\x83B\xe7G\x88B7\xe8\x88B\xad\x87\x84B\xec\x8b\xa7B\xd4I\x97B\x8bk\xa4B\xac\xb3\xd7B\x859\xf3BHv\xe5B\xb0V\xe7BL\x92\xcfB\x1c\x99\xf2BLw\xeaB]\xdb\xfdB\xfb\x97\xedB\xc6\xb4\xdcB\x08\x95\xdeB\x00\x95\xdeB!6\xe4BLu\xe0B\x8as\xd6B-\xb1\xc8BVM\xb0B\xae\x8f\xc0B.\xca\x99B\x80\xac\xaaB\xdc\n\xa0B\xc6\xcd}B\xf4\x07\x87B=h\x8bB\xc4\naBZ\xcdxB\xa1\xccsB\xcc\x0bkB\xa4\xccsB\xc2\xca_B\xc9\tWB\xc6\r\x7fB8\xc8AB\x03\ruB\xff\x89YB\xeeIXBf\n\\BD\rzB/\x89OB\xfbIXB<\xcaZB\xc9\x0bkB\xcd\x88JB8HDB\x1d\xcaZB~\x08HB\xd4\x88JB)\x8cmB\xae\x89OB\x9e\x08HB\xa5\x07>B\xa3\xca_B.MvB\x02\'\x80B\xf6\x87@B5\tMB\xb5\x8d|B\x1aINB\xb0\xcd}B\x98\x89TB\xf9\x8bmB\x9e\xccsBC\x88EB+\x89OB[\x88EB\x92\xc9PB\nJXB\xb2\x0bkB\x0e\x8dwB\x80\x8a^B\xee\x8a^B\x81\x87;B\xd2\tWB\xde\x87@B\x0e\xccnB<\xc77B.\xcaZB8\xc8AB\x17\xcb_B\xa1J]BN\x8aTB\x8aISBy\xc8FB\xa6\x8aYB\xff\x08MB\x1c\x0baB\xb4\x08HB}\x8a^B\x1a\x876B\xe6\tWB\x83\tRB\xd3\x89TB\xa9\xca_B6\x08CBv\xcaZBBINB$\x89OB\xe6IXB\x04JXB"\xcbdBq\tRB\xb7\x08HB\xc0\x88JBc\x89OB\x99\xcbiB%\tMBo\tRB\x0b\x8aYB,\x89OB\x9e\x89TBsISB\xc4\x89TB\x07INBh\tRB\x04\xc9KB`\xc8FB*\xccnB\nJXB\xb0KgB\xdaKlB\xac\xca_B\xcc\naB\xc0HIB\xb8\naB\x02\x0baB{\x87\x84BD\n\\B\xa5\x0bfB\xba\x0cuB\xfaG\x88Br\rzB\xa1\x8d|B\xd7\xe9\x97BT\x08\x8cB$\xcc\xa8B\xc2m\xb3B`\x8e\xb6BNo\xbdBn\xb0\xc3B\x90\xf0\xc4BR\xb0\xc3B\x80\x10\xc3B^3\xd5B\x0eq\xc7Bs\xf2\xceB[\xf1\xc9B]\xcb\xa3B\'\xcc\xa8B\x9b\xed\xb0B\xe8j\x9fB\xd5\xca\x9eBz\xcb\xa3BR\xaa\x9bB)\x8bcB\t\x8cmB\x9cJ]B\xd4\xc9UB\x04\x8bcB\xf2\x0bkB\xd7\xc9UB\x8b\x87;B\xa3\x8a^B\xf6\x88JB\xe2\x87@B\xc6G?B2\xcaZB=\xc9PB\x0c\x8aYB&\xc77B\x8bISB\x8c\x08HB2\x0bfB\x8c\x89TBa\xc8FB\x16\x88@B\xda\xc9UB\x9c\x87;B\xa9\tRB\x18G5B\xc1\xc9UB\xc0\xc62B\xdf\x8a^B\x84\n\\B\xb2\x08HB\x8f\xc7<Bv\xc77Bp\xc8FB \x08CB\xbd\x07>B|\x08HB\xa1\x08HB\x06\xc8ABh\xc8FBJ\xc77B\x15\x08CB\x8c\xc7<B\r\x88@B\xfb\x08MB\xb0\xc9UB\xcb\xc62B\x89\x87;B\xf3F5B(\xc77B8H?B\xdc\x861B\xe4\tWBl\x079B"\tMB\x81\x079Bn\x87;B\x19\x074BM\x86,B5\x08CB{\xc6-B\x82\xc77Bt\x87;B\n\xc8ABVG5B\xf2\xc7AB\xd5\x05%B\x0b\tMB\x8b\x8a^Bh\tRB\xaa\xc7<Bc\x86,B,\x08>B\xc8\xc62B\x88\xc6-B\xb1\x07>By\xc9PB\xec\xc62B\xd8\x05%B\xc0G?B\x11\xc72B\xc9G?BO\xc77B\xb1\x07>BN\xcaZB<HDB\x83\x06/B\xe1\x04\x16B\x1e\x08CB\xa0F0B\xccHIB\x93\x06/B\xccF0BxHDB\xc5\x861B}G:B8\tMB\xe6E&B;\xc77B\x84\x08HB\xa8\xc7<Bp\x87;B\xdc\xc77B\xb9\x861B\x0b\x876B\x1a\xc6(B"\x05\x1bB\x14IIB\x91\xc8FBp\x079BfG:B\xed\x85"B:\x04\x0cB{\x87;BhF+BjHDB\xd4HDB\xa4\xc7<B\x85\x86,B\xb1\xc7<BP\x876B8\x06*B\xf4\x064B\xbb\x07>B\x94HIB1\x08CB0\xc6(B\xa2F0B}\x06/B \x05\x1bB)E\x1cB\'\x06*B\xc5\x07>BN\x86,B\xe7\x05%B\xc8ISBk\xc6-B\xec\xc8KB\xfa\x85\'B3G5B\xfb\x87@B\x90\x06/B\xd0\xc5#B\xef\x85\'B\xaaD\x12B\xb5\xc5#B\xbb\xc4\x14BU\x85\x1dB\x19\x88@B\xaa\x06/B\xc1\x87;BM\x86,B\xbe\x07>B\x8c\xc5\x1eBV\x079B\xe3E&Bf\x06*Bk\xc77B\x95F+B\x8eD\x12B;\xc77B\xb2\xc7<B\x08\xc5\x19B\xb4\xc5#B\xbe\x85"B\xaf\x88J...

Ben Buse

#9
Ok if I use line

np.frombuffer(i.items[0].signal.data.bytes, dtype=np.float32)
based on your code

I decode the bytes and I get the intensity column for the wds spectra in values rather than bytes and plot them! Success!

i.items[0] being the intensity values for the first spectrometer scan e.g. LPET on 1 on that sample,
i.items[1] the next crystal scan e.g. LTAP on 2, etc...

And xvalues determined as
xvalues = list(range(i.items[0].signal.wds_start_pos,i.items[0].signal.wds_start_pos+(1000*int(i.items[0].signal.step_size)),int(i.items[0].signal.step_size)))