Objects
Wang, Weijing, Chen, Junjie, Xu, Zhangwei, Dang, Yingnong, Zhang, Dongmei, Yang, Lin, Zhang, Hongyu, Zhao, Pu, Qiao, Bo, Kang, Yu, Lin, Qingwei, Rajmohan, Saravanakumar, Gao, Feng. Institute of Electrical and Electronics Engineers (IEEE); 2021. How long will it take to mitigate this incident for online service systems?.
Jiang, Jiajun, Lu, Weihai, Chen, Junjie, Lin, Qingwei, Zhao, Pu, Kang, Yu, Zhang, Hongyu, Xiong, Yingfei, Gao, Feng, Xu, Zhangwei, Dang, Yingnong, Zhang, Dongmei. Association for Computing Machinery; 2020. How to mitigate the incident? An effective troubleshooting guide recommendation technique for online service systems.
Wang, Yaohui, Li, Guozheng, Xu, Zhangwei, Zhao, Pu, Qiao, Bo, Li, Liqun, Zhang, Xu, Lin, Qingwei, Wang, Zijian, Kang, Yu, Zhou, Yangfan, Zhang, Hongyu, Gao, Feng, Sun, Jeffrey, Yang, Li, Lee, Pochian. Institute of Electrical and Electronics Engineers (IEEE); 2021. Fast Outage Analysis of Large-Scale Production Clouds with Service Correlation Mining.
Li, Liqun, Zhang, Xu, Gao, Feng, Yang, Li, Lin, Qingwei, Rajmohan, Saravanakumar, Xu, Zhangwei, Zhang, Dongmei, Zhao, Xin, Zhang, Hongyu, Kang, Yu, Zhao, Pu, Qiao, Bo, He, Shilin, Lee, Pochian, Sun, Jeffrey. USENIX Association; 2021. Fighting the Fog of War: Automated Incident Detection for Cloud Systems.
Dong, Hang, Qin, Si, Abuduweili, Abulikemu, Ramanujan, Sanjay, Subramanian, Karthikeyan, Zhou, Andrew, Rajmohan, Saravanakumar, Zhang, Dongmei, Moscibroda, Thomas, Xu, Yong, Qiao, Bo, Zhou, Shandan, Yang, Xian, Luo, Chuan, Zhao, Pu, Lin, Qingwei, Zhang, Hongyu. Association for Computing Machinery (ACM); 2021. Effective low capacity status prediction for cloud systems.
Chen, Zhuangbin, Kang, Yu, Li, Liqun, Zhang, Xu, Zhang, Hongyu, Xu, Hui, Zhou, Yangfan, Yang, Li, Sun, Jeffrey, Xu, Zhangwei, Dang, Yingnong, Gao, Feng, Zhao, Pu, Qiao, Bo, Lin, Qingwei, Zhang, Dongmei, Lyu, Michael R.. Association for Computing Machinery (ACM); 2020. Towards Intelligent Incident Management: Why We Need It and How We Make It.
Wang, Lu, Zhao, Pu, Zhang, Hongyu, Rajmohan, Saravan, Zhang, Dongmei, Du, Chao, Luo, Chuan, Su, Mengna, Yang, Fangkai, Liu, Yudong, Lin, Qingwei, Wang, Min, Dang, Yingnong. Association for Computing Machinery; 2022. NENYA: Cascade Reinforcement Learning for Cost-Aware Failure Mitigation at Microsoft 365.
Liu, Yudong, Yang, Hailan, Zhang, Chenjian, Wang, Paul, Dang, Yingnong, Rajmohan, Saravan, Zhang, Dongmei, Zhao, Pu, Ma, Minghua, Wen, Chengwu, Zhang, Hongyu, Luo, Chuan, Lin, Qingwei, Yi, Chang, Wang, Jiaojian. Association for Computing Machinery; 2022. Multi-task Hierarchical Classification for Disk Failure Prediction in Online Service Systems.