SpringBoot使用SensitiveWord实现敏感词过滤

导入依赖

 
<dependency>
  <groupId>com.github.houbb</groupId>
  <artifactId>sensitive-word</artifactId>
  <version>.2.0</version>
</dependency>

Github地址

方法

方法	参数	返回值	说明
contains(String)	待验证的字符串	布尔值	验证字符串是否包含敏感词
replace(String, ISensitiveWordReplace)	使用指定的替换策略替换敏感词	字符串	返回脱敏后的字符串
replace(String, char)	使用指定的 char 替换敏感词	字符串	返回脱敏后的字符串
replace(String)	使用 * 替换敏感词	字符串	返回脱敏后的字符串
findAll(String)	待验证的字符串	字符串列表	返回字符串中所有敏感词
findFirst(String)	待验证的字符串	字符串	返回字符串中第一个敏感词
findAll(String, IWordResultHandler)	IWordResultHandler 结果处理类	字符串列表	返回字符串中所有敏感词
findFirst(String, IWordResultHandler)	IWordResultHandler 结果处理类	字符串	返回字符串中第一个敏感词

ISensitiveWordReplace：敏感词替换策略。

IWordResultHandler：结果处理。可以对敏感词的结果进行处理，允许用户自定义。内置了WordResultHandlers 工具类。

WordResultHandlers.word()：只保留敏感词单词本身。
WordResultHandlers.raw()：保留敏感词相关信息，包含敏感词以及敏感词对应的开始和结束下标。

默认示例

使用默认提供的方法。

 
@Test
void testWord() {
  String text = "红旗迎风飘扬，主席的画像屹立在天安门前。";
  System.out.println(SensitiveWordHelper.contains(text));
 
  System.out.println(SensitiveWordHelper.replace(text));
  System.out.println(SensitiveWordHelper.replace(text, ''));
 
  System.out.println(SensitiveWordHelper.findFirst(text));
  System.out.println(SensitiveWordHelper.findFirst(text, WordResultHandlers.word()));
  System.out.println(SensitiveWordHelper.findFirst(text, WordResultHandlers.raw()));
 
  System.out.println(SensitiveWordHelper.findAll(text));
  System.out.println(SensitiveWordHelper.findAll(text, WordResultHandlers.word()));
  System.out.println(SensitiveWordHelper.findAll(text, WordResultHandlers.raw()));
 
}

输出：

Init sensitive word map end! Cost time: 163ms

true

****迎风飘扬，***的画像屹立在***前。

0000迎风飘扬，000的画像屹立在000前。

红旗

WordResult{word='红旗', startIndex=0, endIndex=4}

[红旗, 主席, 天安门]

[WordResult{word='红旗', startIndex=0, endIndex=4}, WordResult{word='主席', startIndex=9, endIndex=12}, WordResult{word='天安门', startIndex=18, endIndex=21}]

自定义替换策略示例

采用自定义的替换策略实现。首先需要实现 ISensitiveWordReplace接口自定义替换策略：

 
package com.tothefor;
 
import com.github.houbb.heaven.util.lang.CharUtil;
import com.github.houbb.sensitive.word.api.ISensitiveWordReplace;
import com.github.houbb.sensitive.word.api.ISensitiveWordReplaceContext;
 
public class MySensitiveWordReplace implements ISensitiveWordReplace {
 
  @Override
  public String replace(ISensitiveWordReplaceContext context) {
    String sensitiveWord = context.sensitiveWord();
    // 自定义不同的敏感词替换策略，可以从数据库等地方读取
    if ("红旗".equals(sensitiveWord)) {
      return "旗帜";
    }
 
    if ("天安门".equals(sensitiveWord)) {
      return "门";
    }
 
    if ("主席".equals(sensitiveWord)) {
      return "教员";
    }
    // 其他默认使用 * 代替
    int wordLength = context.wordLength();
    return CharUtil.repeat('*', wordLength);
  }
 
}

使用：

 
@Test
void testWord() {
  String text = "红旗迎风飘扬，主席的画像屹立在天安门前。";
  System.out.println(SensitiveWordHelper.contains(text));
  System.out.println(SensitiveWordHelper.replace(text, new MySensitiveWordReplace()));
 
  String text = "最好的记忆不如最淡的墨水。";
  System.out.println(SensitiveWordHelper.contains(text));
  System.out.println(SensitiveWordHelper.replace(text, new MySensitiveWordReplace()));
 
}

输出：

Init sensitive word map end! Cost time: 16ms

true

旗帜迎风飘扬，教员的画像屹立在门前。

false

最好的记忆不如最淡的墨水。

自定义

点进 SensitiveWordHelper 源码，可以看见以下代码：


private static final SensitiveWordBs WORD_BS = SensitiveWordBs.newInstance().init();

而且可以发现，方法也都是调用的 SensitiveWordBs 类的方法。所以，可以理解成 SensitiveWordHelper 只是对 SensitiveWordBs 的一层封装，而之所以封装就是为了提供给开发者针对简单场景的快速的使用。

而且从上面的创建语句中可以看见，没有加任何其他的东西，就只是初始化了一个，这也是最简单的。接下来就是自定义 SensitiveWordBs 实现敏感词过滤。

自定义SensitiveWordBs

下来看有哪些参数可以加，各项配置的说明如下：

序号	方法	说明
1	ignoreCase	忽略大小写
2	ignoreWidth	忽略半角圆角
3	ignoreNumStyle	忽略数字的写法
4	ignoreChineseStyle	忽略中文的书写格式
5	ignoreEnglishStyle	忽略英文的书写格式
6	ignoreRepeat	忽略重复词
7	enableNumCheck	是否启用数字检测。默认连续 8 位数字认为是敏感词
8	enableEmailCheck	是有启用邮箱检测
9	enableUrlCheck	是否启用链接检测

然后创建自定义的 SensitiveWordBs，如下：

 
package com.tothefor.motorcode.core.SensitiveWord;
 
import com.github.houbb.sensitive.word.bs.SensitiveWordBs;
import com.github.houbb.sensitive.word.support.allow.WordAllows;
import com.github.houbb.sensitive.word.support.deny.WordDenys;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
 
@Configuration
public class SensitiveWordConfig {
 
  @Autowired
  private CustomWordAllow customWordAllow;
 
  @Autowired
  private CustomWordDeny customWordDeny;
 
  /**
     * 初始化引导类
     *
     * @return 初始化引导类
     * @since.0.0
     */
  @Bean
  public SensitiveWordBs sensitiveWordBs() {
    // 可根据数据库数据判断 动态增加配置
    return SensitiveWordBs.newInstance()
      .wordDeny(WordDenys.chains(WordDenys.system(),customWordDeny)) // 设置黑名单
      .wordAllow(WordAllows.chains(WordAllows.system(), customWordAllow)) // 设置白名单
      .ignoreCase(true)
      .ignoreWidth(true)
      .ignoreNumStyle(true)
      .ignoreChineseStyle(true)
      .ignoreEnglishStyle(true)
      .ignoreRepeat(true)
      .enableEmailCheck(true)
      .enableUrlCheck(true)
      // 各种其他配置
      .init();
  }
 
}

其中，wordDeny、wordAllow是自定义敏感词的黑名单和白名单。可以设置单个，也可以设置多个。如下：

 
// 设置系统默认敏感词
SensitiveWordBs wordBs = SensitiveWordBs.newInstance()
     .wordDeny(WordDenys.system()) // 黑名单
     .wordAllow(WordAllows.system()) // 白名单
     .init();
 
// 设置自定义敏感词
SensitiveWordBs wordBs = SensitiveWordBs.newInstance()
     .wordDeny(new MyWordDeny())
     .wordAllow(new MyWordAllow())
     .init();
 
// 设置多个敏感词，系统默认和自定义
IWordDeny wordDeny = WordDenys.chains(WordDenys.system(), new MyWordDeny());
IWordAllow wordAllow = WordAllows.chains(WordAllows.system(), new MyWordAllow());
SensitiveWordBs wordBs = SensitiveWordBs.newInstance()
     .wordDeny(wordDeny)
     .wordAllow(wordAllow)
     .init();

接下来再加自定义敏感词配置。

自定义敏感词白名单

自定义有哪一些是敏感词白名单，如果遇见是需要进行展示的。通过实现 IWordAllow 接口重写 allow() 方法返回白名单敏感词。

 
package com.tothefor.motorcode.core.SensitiveWord;
 
import com.github.houbb.sensitive.word.api.IWordAllow;
import org.springframework.stereotype.Service;
 
import java.util.Arrays;
import java.util.List;
 
@Service
public class CustomWordAllow implements IWordAllow {
 
  /**
     * 允许的内容-返回的内容不被当做敏感词
     * @return
     */
  @Override
  public List<String> allow() {
    // 从数据库中查询白名单敏感词
    return Arrays.asList("红旗");
  }
 
}

自定义敏感词黑名单

增加敏感词黑名单。通过实现 IWordDeny 接口重写 deny() 方法返回黑名单敏感词。

 
package com.tothefor.motorcode.core.SensitiveWord;
 
import com.github.houbb.sensitive.word.api.IWordDeny;
import org.springframework.stereotype.Service;
 
import java.util.Arrays;
import java.util.List;
 
/**
 * 自定义敏感词
 */
@Service
public class CustomWordDeny implements IWordDeny {
 
  /**
     * 拒绝出现的数据-返回的内容被当做是敏感词
     *
     * @return
     */
  @Override
  public List<String> deny() {
    // 从数据库中查询自定义敏感词
    return Arrays.asList("红旗");
  }
 
}

示例

测试自定义使用：

 
@Autowired
private SensitiveWordBs sensitiveWordBs;
 
@Test
void testWord() {
  String text = "红旗迎风飘扬，主席的画像屹立在天安门前。";
  System.out.println(sensitiveWordBs.contains(text));
 
  System.out.println(sensitiveWordBs.replace(text));
  System.out.println(sensitiveWordBs.replace(text, ''));
  System.out.println(sensitiveWordBs.replace(text, new MySensitiveWordReplace()));
 
  System.out.println(sensitiveWordBs.findFirst(text));
  System.out.println(sensitiveWordBs.findFirst(text, WordResultHandlers.word()));
  System.out.println(sensitiveWordBs.findFirst(text, WordResultHandlers.raw()));
 
  System.out.println(sensitiveWordBs.findAll(text));
  System.out.println(sensitiveWordBs.findAll(text, WordResultHandlers.word()));
  System.out.println(sensitiveWordBs.findAll(text, WordResultHandlers.raw()));
 
}

输出：

true

红旗迎风飘扬，***的画像屹立在***前。

红旗迎风飘扬，000的画像屹立在000前。

红旗迎风飘扬，教员的画像屹立在门前。

主席

WordResult{word='主席', startIndex=9, endIndex=12}

[主席, 天安门]

[WordResult{word='主席', startIndex=9, endIndex=12}, WordResult{word='天安门', startIndex=18, endIndex=21}]

可以看见，和之前的有一点不一样。‘红旗’ 并没有被过滤掉，主要原因就是因为我们的自定义敏感词白名单中加入了 ‘红旗’ ，所以没有被过滤掉。但是黑名单中又有这个词，为什么没有被过滤掉？这里有个点就是：如果黑名单和白名单中都有同一个敏感词，那么这个词是不会被过滤的。

重置词库

因为敏感词库的初始化较为耗时，建议程序启动时做一次 init 初始化。但为了保证敏感词修改可以实时生效且保证接口的尽可能简化，可以在数据库词库发生变更时，需要词库生效，主动触发一次初始化 sensitiveWordBs.init()。因为在调用 sensitiveWordBs.init() 的时候，根据 IWordDeny+IWordAllow 重新构建敏感词库。因为初始化可能耗时较长（秒级别），所有优化为 init 未完成时不影响旧的词库功能，完成后以新的为准。

 
@Autowired
private SensitiveWordBs sensitiveWordBs;
sensitiveWordBs.init();

每次数据库的信息发生变化之后，首先调用更新数据库敏感词库的方法，然后调用这个方法。但不推荐将此方法放在数据库被修改后就调用，而推荐单独开一个接口，手动调用。

总结

所有的操作均是在 SensitiveWordBs 上操作的。

	<dependency>
	<groupId>com.github.houbb</groupId>
	<artifactId>sensitive-word</artifactId>
	<version>.2.0</version>
	</dependency>

	@Test
	void testWord() {
	String text = "红旗迎风飘扬，主席的画像屹立在天安门前。";
	System.out.println(SensitiveWordHelper.contains(text));

	System.out.println(SensitiveWordHelper.replace(text));
	System.out.println(SensitiveWordHelper.replace(text, ''));

	System.out.println(SensitiveWordHelper.findFirst(text));
	System.out.println(SensitiveWordHelper.findFirst(text, WordResultHandlers.word()));
	System.out.println(SensitiveWordHelper.findFirst(text, WordResultHandlers.raw()));

	System.out.println(SensitiveWordHelper.findAll(text));
	System.out.println(SensitiveWordHelper.findAll(text, WordResultHandlers.word()));
	System.out.println(SensitiveWordHelper.findAll(text, WordResultHandlers.raw()));

	}

	package com.tothefor;

	import com.github.houbb.heaven.util.lang.CharUtil;
	import com.github.houbb.sensitive.word.api.ISensitiveWordReplace;
	import com.github.houbb.sensitive.word.api.ISensitiveWordReplaceContext;

	public class MySensitiveWordReplace implements ISensitiveWordReplace {

	@Override
	public String replace(ISensitiveWordReplaceContext context) {
	String sensitiveWord = context.sensitiveWord();
	// 自定义不同的敏感词替换策略，可以从数据库等地方读取
	if ("红旗".equals(sensitiveWord)) {
	return "旗帜";
	}

	if ("天安门".equals(sensitiveWord)) {
	return "门";
	}

	if ("主席".equals(sensitiveWord)) {
	return "教员";
	}
	// 其他默认使用 * 代替
	int wordLength = context.wordLength();
	return CharUtil.repeat('*', wordLength);
	}

	}

	package com.tothefor.motorcode.core.SensitiveWord;

	import com.github.houbb.sensitive.word.bs.SensitiveWordBs;
	import com.github.houbb.sensitive.word.support.allow.WordAllows;
	import com.github.houbb.sensitive.word.support.deny.WordDenys;
	import org.springframework.beans.factory.annotation.Autowired;
	import org.springframework.context.annotation.Bean;
	import org.springframework.context.annotation.Configuration;

	@Configuration
	public class SensitiveWordConfig {

	@Autowired
	private CustomWordAllow customWordAllow;

	@Autowired
	private CustomWordDeny customWordDeny;

	/**
	* 初始化引导类
	*
	* @return 初始化引导类
	* @since.0.0
	*/
	@Bean
	public SensitiveWordBs sensitiveWordBs() {
	// 可根据数据库数据判断动态增加配置
	return SensitiveWordBs.newInstance()
	.wordDeny(WordDenys.chains(WordDenys.system(),customWordDeny)) // 设置黑名单
	.wordAllow(WordAllows.chains(WordAllows.system(), customWordAllow)) // 设置白名单
	.ignoreCase(true)
	.ignoreWidth(true)
	.ignoreNumStyle(true)
	.ignoreChineseStyle(true)
	.ignoreEnglishStyle(true)
	.ignoreRepeat(true)
	.enableEmailCheck(true)
	.enableUrlCheck(true)
	// 各种其他配置
	.init();
	}

	}

	// 设置系统默认敏感词
	SensitiveWordBs wordBs = SensitiveWordBs.newInstance()
	.wordDeny(WordDenys.system()) // 黑名单
	.wordAllow(WordAllows.system()) // 白名单
	.init();

	// 设置自定义敏感词
	SensitiveWordBs wordBs = SensitiveWordBs.newInstance()
	.wordDeny(new MyWordDeny())
	.wordAllow(new MyWordAllow())
	.init();

	// 设置多个敏感词，系统默认和自定义
	IWordDeny wordDeny = WordDenys.chains(WordDenys.system(), new MyWordDeny());
	IWordAllow wordAllow = WordAllows.chains(WordAllows.system(), new MyWordAllow());
	SensitiveWordBs wordBs = SensitiveWordBs.newInstance()
	.wordDeny(wordDeny)
	.wordAllow(wordAllow)
	.init();

	package com.tothefor.motorcode.core.SensitiveWord;

	import com.github.houbb.sensitive.word.api.IWordAllow;
	import org.springframework.stereotype.Service;

	import java.util.Arrays;
	import java.util.List;

	@Service
	public class CustomWordAllow implements IWordAllow {

	/**
	* 允许的内容-返回的内容不被当做敏感词
	* @return
	*/
	@Override
	public List<String> allow() {
	// 从数据库中查询白名单敏感词
	return Arrays.asList("红旗");
	}

	}

	@Autowired
	private SensitiveWordBs sensitiveWordBs;

	@Test
	void testWord() {
	String text = "红旗迎风飘扬，主席的画像屹立在天安门前。";
	System.out.println(sensitiveWordBs.contains(text));

	System.out.println(sensitiveWordBs.replace(text));
	System.out.println(sensitiveWordBs.replace(text, ''));
	System.out.println(sensitiveWordBs.replace(text, new MySensitiveWordReplace()));

	System.out.println(sensitiveWordBs.findFirst(text));
	System.out.println(sensitiveWordBs.findFirst(text, WordResultHandlers.word()));
	System.out.println(sensitiveWordBs.findFirst(text, WordResultHandlers.raw()));

	System.out.println(sensitiveWordBs.findAll(text));
	System.out.println(sensitiveWordBs.findAll(text, WordResultHandlers.word()));
	System.out.println(sensitiveWordBs.findAll(text, WordResultHandlers.raw()));

	}

	@Autowired
	private SensitiveWordBs sensitiveWordBs;
	sensitiveWordBs.init();

SpringBoot使用SensitiveWord实现敏感词过滤

目录

导入依赖

方法

默认示例

自定义替换策略示例

自定义

自定义SensitiveWordBs

自定义敏感词白名单

自定义敏感词黑名单

示例

重置词库

总结