ASM Online Member Community

 View Only
  • 1.  Data confidence

    Posted 02-26-2023 15:10

    Regarding data science-assisted computational materials science, how confident are we in existing databases? Who is curating the data? Are these people (or experts) truly qualified for the task? How confident are we on error-traping routines of "bad" data?



    ------------------------------
    Oscar Suarez
    Professor
    UNIVERSITY 0F PUERTO RICO-MAYAGUEZ
    Mayaguez PR
    (787) 464-6739
    ------------------------------
    IMAT Conference & Expo


  • 2.  RE: Data confidence

    Posted 03-03-2023 08:00

    I would suggest @Lesley Frame who may have some insight. Also, @James Saal@David Furrer@Ji-Cheng Zhao@David Furrer



    ------------------------------
    Carrie Hawk
    ASM International
    Community Engagement Specialist

    440-338-5497
    carrieh@asminternational.org
    ------------------------------
    IMAT Conference & Expo


  • 3.  RE: Data confidence

    Posted 03-03-2023 10:48

    I personally don't assume any collected dataset is correct at face value. I check the original source for questionable points (e.g., they deviate from a trend, are outliers, or are regularly showing up with large errors in a trained model). Often there are transcription errors (someone forgot a decimal point or the units are wrong), missing  or wrong metadata (a study annealed a sample whereas all other points are as-quenched), or the original experiment is generally untrustworthy (contaminated samples). Even if it's curated by someone with expertise in the field, collected datasets shouldn't be blindly trusted. A lot of the early successes in materials informatics came from high-throughput DFT-based datasets because they don't suffer from these issues.



    ------------------------------
    James Saal
    Director - External Research Programs
    Citrine Informatics
    ------------------------------

    IMAT Conference & Expo