Skip to content

Latest commit

 

History

History
95 lines (51 loc) · 7.15 KB

File metadata and controls

95 lines (51 loc) · 7.15 KB

MS-KEBAB

OVERVIEW

Microsoft KnowledgE Base construction and Access Benchmark (MS-KEBAB) is source code for evaluating AI systems that construct and use knowledge bases. It does not provide data. It provides evaluation metrics that can assess whether the knowledge base contains the correct information in the correct format, and whether the system can effectively use the knowledge base to answer queries. The user is expected to provide output from the system and the desired output. MS-KEBAB performs the comparison between the two. To perform this comparison, MS-KEBAB defines a custom format for representing a knowledge base.

WHAT CAN MS-KEBAB DO

MS-KEBAB was developed to accelerate research by facilitating the comparison of approaches to knowledge base construction and use. Evaluating a knowledge base is difficult because information can be represented in many equivalent ways. Evaluation by exact string matching is too strict, eliminating systems that could be useful in practice. On the other hand, if the metrics are too lenient, then we may favor systems that are useless in practice because they tricked the metric.

MS-KEBAB can use language models for some of its evaluations. It comes pre-configured to use:

INTENDED USES

MS-KEBAB is best suited for comparing different approaches to knowledge base construction and use, along the dimensions of recall and precision, for research purposes.

MS-KEBAB is intended to be used by domain experts who are independently capable of evaluating the quality of outputs before acting on them.

MS-KEBAB was designed to work in English.

OUT-OF-SCOPE USES

MS-KEBAB is not well suited for assessing the suitability of a system in a commercial or real-world application. There are many real-world scenarios and metrics that are not being tested.

We do not recommend using MS-KEBAB in commercial or real-world applications without further testing and development. It is being released for research purposes.

MS-KEBAB was not designed or evaluated for all possible downstream purposes. Developers should consider its inherent limitations as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness concerns specific to each intended downstream use.

MS-KEBAB should not be used in highly regulated domains where inaccurate outputs could suggest actions that lead to injury or negatively impact an individual's legal, financial, or life opportunities.

We do not recommend using MS-KEBAB in the context of high-risk decision making (e.g. in law enforcement, legal, finance, or healthcare).

HOW TO GET STARTED

To begin using MS-KEBAB, clone the repository and follow the instructions in the README.

EVALUATION

MS-KEBAB was evaluated on its ability to rank systems for knowledge base construction and use, based on our belief of how such systems should be ranked.

EVALUATION METHODS

We took well-established datasets from the literature such as Re-DocRED and REBEL, applied well-known algorithms such as retrieval-augmented generation as well as our own research algorithms, and assessed the outputs against the provided ground truth.

We took a sophisticated system for knowledge base construction and compared it to a simpler one, verifying that MS-KEBAB ranked the sophisticated system higher.

We also looked at individual outputs from various systems and verified that MS-KEBAB agreed with our judgement of whether those outputs matched the desired output.

Results may vary if MS-KEBAB is used with a different model, based on their unique design, configuration and training.

EVALUATION RESULTS

At a high level, we found that MS-KEBAB mostly agreed with a human expert in ranking systems.

LIMITATIONS

MS-KEBAB was developed for research and experimental purposes. Further testing and validation are needed before considering its application in commercial or real-world scenarios.

MS-KEBAB was designed and tested using the English language. Performance in other languages may vary and should be assessed by someone who is both an expert in the expected outputs and a native speaker of that language.

Outputs generated by AI may include factual errors, fabrication, or speculation. Users are responsible for assessing the accuracy of generated content. All decisions leveraging outputs of the system should be made with human oversight and not be based solely on system outputs.

There has not been a systematic effort to ensure that systems using MS-KEBAB are protected from security vulnerabilities such as indirect prompt injection attacks. Any systems using it should take proactive measures to harden their systems as appropriate.

BEST PRACTICES

We strongly encourage users to use LLMs that support robust Responsible AI mitigations, such as Azure Open AI (AOAI) services. Such services continually update their safety and Responsible AI mitigations with the latest industry standards for responsible use. For more on AOAI’s best practices when employing foundation models for scripts and applications:

Users are responsible for sourcing their datasets legally and ethically. This could include securing appropriate copyrights, ensuring consent for use of audio/images, and/or the anonymization of data prior to use in research.

Users are reminded to be mindful of data privacy concerns and are encouraged to review the privacy policies associated with any models and data storage solutions interfacing with MS-KEBAB.

It is the user’s responsibility to ensure that the use of MS-KEBAB complies with relevant data protection regulations and organizational guidelines.

LICENSE

MIT License

TRADEMARKS

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third party's policies.

CONTACT

We welcome feedback and collaboration from our audience. If you have suggestions, questions, or observe unexpected/offensive behavior in our technology, please contact us by opening an issue on GitHub.

If the team receives reports of undesired behavior or identifies issues independently, we will update this repository with appropriate mitigations.